Saved in:
| Main Authors: | Anagnostidis, Sotiris, Pavllo, Dario, Biggio, Luca, Noci, Lorenzo, Lucchi, Aurelien, Hofmann, Thomas |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.15805 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Susceptible are LLMs to Influence in Prompts?
by: Anagnostidis, Sotiris, et al.
Published: (2024)
by: Anagnostidis, Sotiris, et al.
Published: (2024)
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
by: Dong, Yihe, et al.
Published: (2025)
by: Dong, Yihe, et al.
Published: (2025)
Thinking into the Future: Latent Lookahead Training for Transformers
by: Noci, Lorenzo, et al.
Published: (2026)
by: Noci, Lorenzo, et al.
Published: (2026)
Towards Meta-Pruning via Optimal Transport
by: Theus, Alexander, et al.
Published: (2024)
by: Theus, Alexander, et al.
Published: (2024)
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
by: Anagnostidis, Sotiris, et al.
Published: (2023)
by: Anagnostidis, Sotiris, et al.
Published: (2023)
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
by: Bachmann, Gregor, et al.
Published: (2025)
by: Bachmann, Gregor, et al.
Published: (2025)
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
by: Fu, Qichen, et al.
Published: (2024)
by: Fu, Qichen, et al.
Published: (2024)
On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study
by: Alberghi, Riccardo, et al.
Published: (2025)
by: Alberghi, Riccardo, et al.
Published: (2025)
Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers
by: Bill, Eric Tillman, et al.
Published: (2025)
by: Bill, Eric Tillman, et al.
Published: (2025)
Transformer Fusion with Optimal Transport
by: Imfeld, Moritz, et al.
Published: (2023)
by: Imfeld, Moritz, et al.
Published: (2023)
A Language Model's Guide Through Latent Space
by: von Rütte, Dimitri, et al.
Published: (2024)
by: von Rütte, Dimitri, et al.
Published: (2024)
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)
by: Belloni, Annalisa, et al.
Published: (2026)
Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management
by: Cui, Guanyu, et al.
Published: (2026)
by: Cui, Guanyu, et al.
Published: (2026)
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
by: Federici, Marco, et al.
Published: (2024)
by: Federici, Marco, et al.
Published: (2024)
Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement
by: Marwah, Riju, et al.
Published: (2026)
by: Marwah, Riju, et al.
Published: (2026)
Earley-Driven Dynamic Pruning for Efficient Structured Decoding
by: Sun, Xintong, et al.
Published: (2025)
by: Sun, Xintong, et al.
Published: (2025)
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
by: Ali, Ameen, et al.
Published: (2024)
by: Ali, Ameen, et al.
Published: (2024)
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
by: Zheng, Chenyu, et al.
Published: (2024)
by: Zheng, Chenyu, et al.
Published: (2024)
Context Dependence and Reliability in Autoregressive Language Models
by: Sengupta, Poushali, et al.
Published: (2026)
by: Sengupta, Poushali, et al.
Published: (2026)
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
by: Neo, Clement, et al.
Published: (2024)
by: Neo, Clement, et al.
Published: (2024)
Multipole Attention for Efficient Long Context Reasoning
by: Hooper, Coleman, et al.
Published: (2025)
by: Hooper, Coleman, et al.
Published: (2025)
Efficient Mathematical Reasoning Models via Dynamic Pruning and Knowledge Distillation
by: Yu, Fengming, et al.
Published: (2025)
by: Yu, Fengming, et al.
Published: (2025)
HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
by: He, Zifan, et al.
Published: (2024)
by: He, Zifan, et al.
Published: (2024)
On the Emergence of Induction Heads for In-Context Learning
by: Musat, Tiberiu, et al.
Published: (2025)
by: Musat, Tiberiu, et al.
Published: (2025)
SAP: Syntactic Attention Pruning for Transformer-based Language Models
by: Lee, Tzu-Yun, et al.
Published: (2025)
by: Lee, Tzu-Yun, et al.
Published: (2025)
Does Transformer Interpretability Transfer to RNNs?
by: Paulo, Gonçalo, et al.
Published: (2024)
by: Paulo, Gonçalo, et al.
Published: (2024)
Adaptive Computation Pruning for the Forgetting Transformer
by: Lin, Zhixuan, et al.
Published: (2025)
by: Lin, Zhixuan, et al.
Published: (2025)
Generalized Linear Mode Connectivity for Transformers
by: Theus, Alexander, et al.
Published: (2025)
by: Theus, Alexander, et al.
Published: (2025)
Pruning Literals for Highly Efficient Explainability at Word Level
by: Yadav, Rohan Kumar, et al.
Published: (2024)
by: Yadav, Rohan Kumar, et al.
Published: (2024)
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
by: Goel, Raghavv, et al.
Published: (2025)
by: Goel, Raghavv, et al.
Published: (2025)
On Importance of Pruning and Distillation for Efficient Low Resource NLP
by: Mirashi, Aishwarya, et al.
Published: (2024)
by: Mirashi, Aishwarya, et al.
Published: (2024)
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
by: Zhang, Ruixiang, et al.
Published: (2025)
by: Zhang, Ruixiang, et al.
Published: (2025)
Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)
Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models
by: Tayaranian, Mohammadreza, et al.
Published: (2024)
by: Tayaranian, Mohammadreza, et al.
Published: (2024)
Cross-Platform Digital Discourse Analysis of the Israel-Hamas Conflict: Sentiment, Topics, and Event Dynamics
by: Antonakaki, Despoina, et al.
Published: (2025)
by: Antonakaki, Despoina, et al.
Published: (2025)
Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers
by: Huang, Yiran, et al.
Published: (2026)
by: Huang, Yiran, et al.
Published: (2026)
Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph
by: Irie, Kazuki
Published: (2024)
by: Irie, Kazuki
Published: (2024)
Improving Autoregressive Training with Dynamic Oracles
by: Yang, Jianing, et al.
Published: (2024)
by: Yang, Jianing, et al.
Published: (2024)
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
by: Capps, Chad A.
Published: (2026)
by: Capps, Chad A.
Published: (2026)
Mechanistic Interpretability of Binary and Ternary Transformers
by: Li, Jason
Published: (2024)
by: Li, Jason
Published: (2024)
Similar Items
-
How Susceptible are LLMs to Influence in Prompts?
by: Anagnostidis, Sotiris, et al.
Published: (2024) -
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
by: Dong, Yihe, et al.
Published: (2025) -
Thinking into the Future: Latent Lookahead Training for Transformers
by: Noci, Lorenzo, et al.
Published: (2026) -
Towards Meta-Pruning via Optimal Transport
by: Theus, Alexander, et al.
Published: (2024) -
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
by: Anagnostidis, Sotiris, et al.
Published: (2023)