:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Anagnostidis, Sotiris, Pavllo, Dario, Biggio, Luca, Noci, Lorenzo, Lucchi, Aurelien, Hofmann, Thomas
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2305.15805
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Susceptible are LLMs to Influence in Prompts?
by: Anagnostidis, Sotiris, et al.
Published: (2024)

Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
by: Dong, Yihe, et al.
Published: (2025)

Thinking into the Future: Latent Lookahead Training for Transformers
by: Noci, Lorenzo, et al.
Published: (2026)

Towards Meta-Pruning via Optimal Transport
by: Theus, Alexander, et al.
Published: (2024)

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
by: Anagnostidis, Sotiris, et al.
Published: (2023)

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
by: Bachmann, Gregor, et al.
Published: (2025)

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
by: Fu, Qichen, et al.
Published: (2024)

On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study
by: Alberghi, Riccardo, et al.
Published: (2025)

Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers
by: Bill, Eric Tillman, et al.
Published: (2025)

Transformer Fusion with Optimal Transport
by: Imfeld, Moritz, et al.
Published: (2023)

A Language Model's Guide Through Latent Space
by: von Rütte, Dimitri, et al.
Published: (2024)

Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)

Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management
by: Cui, Guanyu, et al.
Published: (2026)

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
by: Federici, Marco, et al.
Published: (2024)

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement
by: Marwah, Riju, et al.
Published: (2026)

Earley-Driven Dynamic Pruning for Efficient Structured Decoding
by: Sun, Xintong, et al.
Published: (2025)

Mitigating Copy Bias in In-Context Learning through Neuron Pruning
by: Ali, Ameen, et al.
Published: (2024)

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
by: Zheng, Chenyu, et al.
Published: (2024)

Context Dependence and Reliability in Autoregressive Language Models
by: Sengupta, Poushali, et al.
Published: (2026)

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
by: Neo, Clement, et al.
Published: (2024)

Multipole Attention for Efficient Long Context Reasoning
by: Hooper, Coleman, et al.
Published: (2025)

Efficient Mathematical Reasoning Models via Dynamic Pruning and Knowledge Distillation
by: Yu, Fengming, et al.
Published: (2025)

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
by: He, Zifan, et al.
Published: (2024)

On the Emergence of Induction Heads for In-Context Learning
by: Musat, Tiberiu, et al.
Published: (2025)

SAP: Syntactic Attention Pruning for Transformer-based Language Models
by: Lee, Tzu-Yun, et al.
Published: (2025)

Does Transformer Interpretability Transfer to RNNs?
by: Paulo, Gonçalo, et al.
Published: (2024)

Adaptive Computation Pruning for the Forgetting Transformer
by: Lin, Zhixuan, et al.
Published: (2025)

Generalized Linear Mode Connectivity for Transformers
by: Theus, Alexander, et al.
Published: (2025)

Pruning Literals for Highly Efficient Explainability at Word Level
by: Yadav, Rohan Kumar, et al.
Published: (2024)

VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
by: Goel, Raghavv, et al.
Published: (2025)

On Importance of Pruning and Distillation for Efficient Low Resource NLP
by: Mirashi, Aishwarya, et al.
Published: (2024)

Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
by: Zhang, Ruixiang, et al.
Published: (2025)

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)

Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models
by: Tayaranian, Mohammadreza, et al.
Published: (2024)

Cross-Platform Digital Discourse Analysis of the Israel-Hamas Conflict: Sentiment, Topics, and Event Dynamics
by: Antonakaki, Despoina, et al.
Published: (2025)

Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers
by: Huang, Yiran, et al.
Published: (2026)

Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph
by: Irie, Kazuki
Published: (2024)

Improving Autoregressive Training with Dynamic Oracles
by: Yang, Jianing, et al.
Published: (2024)

CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
by: Capps, Chad A.
Published: (2026)

Mechanistic Interpretability of Binary and Ternary Transformers
by: Li, Jason
Published: (2024)