Saved in:
| Main Authors: | Goldstein, Daniel, Alcaide, Eric, Lu, Janna, Cheah, Eugene |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.03005 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
by: Goldstein, Daniel, et al.
Published: (2026)
by: Goldstein, Daniel, et al.
Published: (2026)
RWKV-7 "Goose" with Expressive Dynamic State Evolution
by: Peng, Bo, et al.
Published: (2025)
by: Peng, Bo, et al.
Published: (2025)
Forget Attention: Importance-Aware Attention Is All You Need
by: Shin, Soohyeong, et al.
Published: (2026)
by: Shin, Soohyeong, et al.
Published: (2026)
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
by: Dhole, Kaustubh
Published: (2025)
by: Dhole, Kaustubh
Published: (2025)
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
by: Höth, Max Henning, et al.
Published: (2026)
by: Höth, Max Henning, et al.
Published: (2026)
Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
by: Fu, Tianyu, et al.
Published: (2024)
by: Fu, Tianyu, et al.
Published: (2024)
Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
by: Oketunji, Abiodun Finbarrs
Published: (2023)
by: Oketunji, Abiodun Finbarrs
Published: (2023)
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks
by: Nielsen, Dan Saattrup, et al.
Published: (2024)
by: Nielsen, Dan Saattrup, et al.
Published: (2024)
SSSD: Simply-Scalable Speculative Decoding
by: Marzollo, Michele, et al.
Published: (2024)
by: Marzollo, Michele, et al.
Published: (2024)
Attention Drift: What Autoregressive Speculative Decoding Models Learn
by: Eldenk, Doğaç, et al.
Published: (2026)
by: Eldenk, Doğaç, et al.
Published: (2026)
Dodo: Dynamic Contextual Compression for Decoder-only LMs
by: Qin, Guanghui, et al.
Published: (2023)
by: Qin, Guanghui, et al.
Published: (2023)
Weakly Supervised Distillation of Hallucination Signals into Transformer Representations
by: Salehmohamed, Shoaib Sadiq, et al.
Published: (2026)
by: Salehmohamed, Shoaib Sadiq, et al.
Published: (2026)
Large Language Model (LLM) Bias Index -- LLMBI
by: Oketunji, Abiodun Finbarrs, et al.
Published: (2023)
by: Oketunji, Abiodun Finbarrs, et al.
Published: (2023)
Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition
by: Labadie-Tamayo, Roberto, et al.
Published: (2025)
by: Labadie-Tamayo, Roberto, et al.
Published: (2025)
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
by: Zhang, Gongbo, et al.
Published: (2026)
by: Zhang, Gongbo, et al.
Published: (2026)
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
by: Xu, Shuyao, et al.
Published: (2025)
by: Xu, Shuyao, et al.
Published: (2025)
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
by: Liu, Ming
Published: (2026)
by: Liu, Ming
Published: (2026)
Revisiting Intermediate-Layer Matching in Knowledge Distillation: Layer-Selection Strategy Doesn't Matter (Much)
by: Yu, Zony, et al.
Published: (2025)
by: Yu, Zony, et al.
Published: (2025)
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
by: Wang, Fali, et al.
Published: (2025)
by: Wang, Fali, et al.
Published: (2025)
Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
by: Hong, Colin, et al.
Published: (2025)
by: Hong, Colin, et al.
Published: (2025)
Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation
by: Nourbakhsh, Aria, et al.
Published: (2026)
by: Nourbakhsh, Aria, et al.
Published: (2026)
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
by: Zhou, Qirui, et al.
Published: (2025)
by: Zhou, Qirui, et al.
Published: (2025)
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
by: Wang, Fali, et al.
Published: (2025)
by: Wang, Fali, et al.
Published: (2025)
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
by: Jin, Heegon, et al.
Published: (2024)
by: Jin, Heegon, et al.
Published: (2024)
ALISON: Fast and Effective Stylometric Authorship Obfuscation
by: Xing, Eric, et al.
Published: (2024)
by: Xing, Eric, et al.
Published: (2024)
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)
by: Kumar, Sachin
Published: (2026)
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)
by: Fadli, Samih
Published: (2025)
Softmax Linear Attention: Reclaiming Global Competition
by: Xu, Mingwei, et al.
Published: (2026)
by: Xu, Mingwei, et al.
Published: (2026)
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
by: Anshumann, et al.
Published: (2025)
by: Anshumann, et al.
Published: (2025)
Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation
by: Shafique, Muhammad Ali, et al.
Published: (2025)
by: Shafique, Muhammad Ali, et al.
Published: (2025)
Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions
by: Yun, Taedong, et al.
Published: (2025)
by: Yun, Taedong, et al.
Published: (2025)
DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
by: Wang, Ziyi, et al.
Published: (2026)
by: Wang, Ziyi, et al.
Published: (2026)
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025)
by: Peters, Sydney, et al.
Published: (2025)
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
by: Cha, Jungyoub, et al.
Published: (2025)
by: Cha, Jungyoub, et al.
Published: (2025)
ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering
by: Ghosh, Shubhra, et al.
Published: (2025)
by: Ghosh, Shubhra, et al.
Published: (2025)
Dealing with Annotator Disagreement in Hate Speech Classification
by: Dehghan, Somaiyeh, et al.
Published: (2025)
by: Dehghan, Somaiyeh, et al.
Published: (2025)
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
by: Hong, Chunsan, et al.
Published: (2025)
by: Hong, Chunsan, et al.
Published: (2025)
On Explaining with Attention Matrices
by: Naim, Omar, et al.
Published: (2024)
by: Naim, Omar, et al.
Published: (2024)
A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models
by: Dhole, Kaustubh D.
Published: (2025)
by: Dhole, Kaustubh D.
Published: (2025)
AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese
by: Simplício, Afonso, et al.
Published: (2026)
by: Simplício, Afonso, et al.
Published: (2026)
Similar Items
-
Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
by: Goldstein, Daniel, et al.
Published: (2026) -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
by: Peng, Bo, et al.
Published: (2025) -
Forget Attention: Importance-Aware Attention Is All You Need
by: Shin, Soohyeong, et al.
Published: (2026) -
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
by: Dhole, Kaustubh
Published: (2025) -
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
by: Höth, Max Henning, et al.
Published: (2026)