Saved in:
| Main Authors: | Szatkowski, Filip, Będkowski, Patryk, Devoto, Alessio, Dubiński, Jan, Minervini, Pasquale, Piórczyński, Mikołaj, Scardapane, Simone, Wójcik, Bartosz |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.00454 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
by: Szatkowski, Filip, et al.
Published: (2023)
by: Szatkowski, Filip, et al.
Published: (2023)
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
by: Wójcik, Bartosz, et al.
Published: (2023)
by: Wójcik, Bartosz, et al.
Published: (2023)
ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
by: Będkowski, Patryk, et al.
Published: (2025)
by: Będkowski, Patryk, et al.
Published: (2025)
A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression
by: Devoto, Alessio, et al.
Published: (2024)
by: Devoto, Alessio, et al.
Published: (2024)
Efficient LLM Moderation with Multi-Layer Latent Prototypes
by: Chrabąszcz, Maciej, et al.
Published: (2025)
by: Chrabąszcz, Maciej, et al.
Published: (2025)
Deep Generative Models for Proton Zero Degree Calorimeter Simulations in ALICE, CERN
by: Będkowski, Patryk, et al.
Published: (2024)
by: Będkowski, Patryk, et al.
Published: (2024)
Conditional computation in neural networks: principles and research trends
by: Scardapane, Simone, et al.
Published: (2024)
by: Scardapane, Simone, et al.
Published: (2024)
Class incremental learning with probability dampening and cascaded gated classifier
by: Pomponi, Jary, et al.
Published: (2024)
by: Pomponi, Jary, et al.
Published: (2024)
Rethinking Calibration for Early-Exit Neural Networks
by: Kubaty, Piotr, et al.
Published: (2025)
by: Kubaty, Piotr, et al.
Published: (2025)
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning
by: Devoto, Alessio, et al.
Published: (2024)
by: Devoto, Alessio, et al.
Published: (2024)
Adaptive Semantic Token Selection for AI-native Goal-oriented Communications
by: Devoto, Alessio, et al.
Published: (2024)
by: Devoto, Alessio, et al.
Published: (2024)
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression
by: Godey, Nathan, et al.
Published: (2025)
by: Godey, Nathan, et al.
Published: (2025)
Adaptive Semantic Token Communication for Transformer-based Edge Inference
by: Devoto, Alessio, et al.
Published: (2025)
by: Devoto, Alessio, et al.
Published: (2025)
Goal-oriented Communications based on Recursive Early Exit Neural Networks
by: Pomponi, Jary, et al.
Published: (2024)
by: Pomponi, Jary, et al.
Published: (2024)
Attention Sinks in Diffusion Language Models
by: Rulli, Maximo Eduardo, et al.
Published: (2025)
by: Rulli, Maximo Eduardo, et al.
Published: (2025)
Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land
by: Scardapane, Simone
Published: (2024)
by: Scardapane, Simone
Published: (2024)
Interpretable Classification of Levantine Ceramic Thin Sections via Neural Networks
by: Capriotti, Sara, et al.
Published: (2025)
by: Capriotti, Sara, et al.
Published: (2025)
Low-Rank Compression of Language Models via Differentiable Rank Selection
by: Sundrani, Sidhant, et al.
Published: (2025)
by: Sundrani, Sidhant, et al.
Published: (2025)
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
by: Tyukin, Georgy, et al.
Published: (2024)
by: Tyukin, Georgy, et al.
Published: (2024)
Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection
by: Genovese, Donatella, et al.
Published: (2025)
by: Genovese, Donatella, et al.
Published: (2025)
Efficient Multi-Source Knowledge Transfer by Model Merging
by: Osial, Marcin, et al.
Published: (2025)
by: Osial, Marcin, et al.
Published: (2025)
An Auditing Test To Detect Behavioral Shift in Language Models
by: Richter, Leo, et al.
Published: (2024)
by: Richter, Leo, et al.
Published: (2024)
Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional Attributions
by: Sobieski, Bartlomiej, et al.
Published: (2026)
by: Sobieski, Bartlomiej, et al.
Published: (2026)
Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation
by: Bartkowiak, Patryk, et al.
Published: (2025)
by: Bartkowiak, Patryk, et al.
Published: (2025)
Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes
by: Montagna, Marco, et al.
Published: (2024)
by: Montagna, Marco, et al.
Published: (2024)
Activation Sparsity Opportunities for Compressing General Large Language Models
by: Dhar, Nobel, et al.
Published: (2024)
by: Dhar, Nobel, et al.
Published: (2024)
ActTail: Global Activation Sparsity in Large Language Models
by: Hou, Wenwen, et al.
Published: (2026)
by: Hou, Wenwen, et al.
Published: (2026)
Generative Diffusion Models for Fast Simulations of Particle Collisions at CERN
by: Kita, Mikołaj, et al.
Published: (2024)
by: Kita, Mikołaj, et al.
Published: (2024)
SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages
by: Ghazaryan, Gayane, et al.
Published: (2024)
by: Ghazaryan, Gayane, et al.
Published: (2024)
Exploring the Stability Gap in Continual Learning: The Role of the Classification Head
by: Łapacz, Wojciech, et al.
Published: (2024)
by: Łapacz, Wojciech, et al.
Published: (2024)
Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization
by: Nowak, Aleksandra Irena, et al.
Published: (2024)
by: Nowak, Aleksandra Irena, et al.
Published: (2024)
Neurosymbolic Diffusion Models
by: van Krieken, Emile, et al.
Published: (2025)
by: van Krieken, Emile, et al.
Published: (2025)
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
by: Zhao, Yu, et al.
Published: (2024)
by: Zhao, Yu, et al.
Published: (2024)
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
by: Luo, Yuqi, et al.
Published: (2024)
by: Luo, Yuqi, et al.
Published: (2024)
Jailbreaking Vision-Language Models Through the Visual Modality
by: Azulay, Aharon, et al.
Published: (2026)
by: Azulay, Aharon, et al.
Published: (2026)
Interpreting Temporal Graph Neural Networks with Koopman Theory
by: Guerra, Michele, et al.
Published: (2024)
by: Guerra, Michele, et al.
Published: (2024)
CDI: Copyrighted Data Identification in Diffusion Models
by: Dubiński, Jan, et al.
Published: (2024)
by: Dubiński, Jan, et al.
Published: (2024)
Probing the Emergence of Cross-lingual Alignment during LLM Training
by: Wang, Hetong, et al.
Published: (2024)
by: Wang, Hetong, et al.
Published: (2024)
Privacy Attacks on Image AutoRegressive Models
by: Kowalczuk, Antoni, et al.
Published: (2025)
by: Kowalczuk, Antoni, et al.
Published: (2025)
Temporal Smoothness Regularisers for Neural Link Predictors
by: Dileo, Manuel, et al.
Published: (2023)
by: Dileo, Manuel, et al.
Published: (2023)
Similar Items
-
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
by: Szatkowski, Filip, et al.
Published: (2023) -
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
by: Wójcik, Bartosz, et al.
Published: (2023) -
ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
by: Będkowski, Patryk, et al.
Published: (2025) -
A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression
by: Devoto, Alessio, et al.
Published: (2024) -
Efficient LLM Moderation with Multi-Layer Latent Prototypes
by: Chrabąszcz, Maciej, et al.
Published: (2025)