:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Szatkowski, Filip, Będkowski, Patryk, Devoto, Alessio, Dubiński, Jan, Minervini, Pasquale, Piórczyński, Mikołaj, Scardapane, Simone, Wójcik, Bartosz
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2509.00454
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
by: Szatkowski, Filip, et al.
Published: (2023)

Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
by: Wójcik, Bartosz, et al.
Published: (2023)

ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
by: Będkowski, Patryk, et al.
Published: (2025)

A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression
by: Devoto, Alessio, et al.
Published: (2024)

Efficient LLM Moderation with Multi-Layer Latent Prototypes
by: Chrabąszcz, Maciej, et al.
Published: (2025)

Deep Generative Models for Proton Zero Degree Calorimeter Simulations in ALICE, CERN
by: Będkowski, Patryk, et al.
Published: (2024)

Conditional computation in neural networks: principles and research trends
by: Scardapane, Simone, et al.
Published: (2024)

Class incremental learning with probability dampening and cascaded gated classifier
by: Pomponi, Jary, et al.
Published: (2024)

Rethinking Calibration for Early-Exit Neural Networks
by: Kubaty, Piotr, et al.
Published: (2025)

Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning
by: Devoto, Alessio, et al.
Published: (2024)

Adaptive Semantic Token Selection for AI-native Goal-oriented Communications
by: Devoto, Alessio, et al.
Published: (2024)

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression
by: Godey, Nathan, et al.
Published: (2025)

Adaptive Semantic Token Communication for Transformer-based Edge Inference
by: Devoto, Alessio, et al.
Published: (2025)

Goal-oriented Communications based on Recursive Early Exit Neural Networks
by: Pomponi, Jary, et al.
Published: (2024)

Attention Sinks in Diffusion Language Models
by: Rulli, Maximo Eduardo, et al.
Published: (2025)

Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land
by: Scardapane, Simone
Published: (2024)

Interpretable Classification of Levantine Ceramic Thin Sections via Neural Networks
by: Capriotti, Sara, et al.
Published: (2025)

Low-Rank Compression of Language Models via Differentiable Rank Selection
by: Sundrani, Sidhant, et al.
Published: (2025)

Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
by: Tyukin, Georgy, et al.
Published: (2024)

Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection
by: Genovese, Donatella, et al.
Published: (2025)

Efficient Multi-Source Knowledge Transfer by Model Merging
by: Osial, Marcin, et al.
Published: (2025)

An Auditing Test To Detect Behavioral Shift in Language Models
by: Richter, Leo, et al.
Published: (2024)

Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional Attributions
by: Sobieski, Bartlomiej, et al.
Published: (2026)

Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation
by: Bartkowiak, Patryk, et al.
Published: (2025)

Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes
by: Montagna, Marco, et al.
Published: (2024)

Activation Sparsity Opportunities for Compressing General Large Language Models
by: Dhar, Nobel, et al.
Published: (2024)

ActTail: Global Activation Sparsity in Large Language Models
by: Hou, Wenwen, et al.
Published: (2026)

Generative Diffusion Models for Fast Simulations of Particle Collisions at CERN
by: Kita, Mikołaj, et al.
Published: (2024)

SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages
by: Ghazaryan, Gayane, et al.
Published: (2024)

Exploring the Stability Gap in Continual Learning: The Role of the Classification Head
by: Łapacz, Wojciech, et al.
Published: (2024)

Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization
by: Nowak, Aleksandra Irena, et al.
Published: (2024)

Neurosymbolic Diffusion Models
by: van Krieken, Emile, et al.
Published: (2025)

Analysing the Residual Stream of Language Models Under Knowledge Conflicts
by: Zhao, Yu, et al.
Published: (2024)

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
by: Luo, Yuqi, et al.
Published: (2024)

Jailbreaking Vision-Language Models Through the Visual Modality
by: Azulay, Aharon, et al.
Published: (2026)

Interpreting Temporal Graph Neural Networks with Koopman Theory
by: Guerra, Michele, et al.
Published: (2024)

CDI: Copyrighted Data Identification in Diffusion Models
by: Dubiński, Jan, et al.
Published: (2024)

Probing the Emergence of Cross-lingual Alignment during LLM Training
by: Wang, Hetong, et al.
Published: (2024)

Privacy Attacks on Image AutoRegressive Models
by: Kowalczuk, Antoni, et al.
Published: (2025)

Temporal Smoothness Regularisers for Neural Link Predictors
by: Dileo, Manuel, et al.
Published: (2023)