:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Goldstein, Daniel, Alcaide, Eric, Lu, Janna, Cheah, Eugene
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning I.2.7
Online Access:	https://arxiv.org/abs/2505.03005
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
by: Goldstein, Daniel, et al.
Published: (2026)

RWKV-7 "Goose" with Expressive Dynamic State Evolution
by: Peng, Bo, et al.
Published: (2025)

Forget Attention: Importance-Aware Attention Is All You Need
by: Shin, Soohyeong, et al.
Published: (2026)

Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
by: Dhole, Kaustubh
Published: (2025)

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
by: Höth, Max Henning, et al.
Published: (2026)

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
by: Fu, Tianyu, et al.
Published: (2024)

Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
by: Oketunji, Abiodun Finbarrs
Published: (2023)

Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks
by: Nielsen, Dan Saattrup, et al.
Published: (2024)

SSSD: Simply-Scalable Speculative Decoding
by: Marzollo, Michele, et al.
Published: (2024)

Attention Drift: What Autoregressive Speculative Decoding Models Learn
by: Eldenk, Doğaç, et al.
Published: (2026)

Dodo: Dynamic Contextual Compression for Decoder-only LMs
by: Qin, Guanghui, et al.
Published: (2023)

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations
by: Salehmohamed, Shoaib Sadiq, et al.
Published: (2026)

Large Language Model (LLM) Bias Index -- LLMBI
by: Oketunji, Abiodun Finbarrs, et al.
Published: (2023)

Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition
by: Labadie-Tamayo, Roberto, et al.
Published: (2025)

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
by: Zhang, Gongbo, et al.
Published: (2026)

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
by: Xu, Shuyao, et al.
Published: (2025)

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
by: Liu, Ming
Published: (2026)

Revisiting Intermediate-Layer Matching in Knowledge Distillation: Layer-Selection Strategy Doesn't Matter (Much)
by: Yu, Zony, et al.
Published: (2025)

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
by: Wang, Fali, et al.
Published: (2025)

Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
by: Hong, Colin, et al.
Published: (2025)

Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation
by: Nourbakhsh, Aria, et al.
Published: (2026)

QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
by: Zhou, Qirui, et al.
Published: (2025)

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
by: Wang, Fali, et al.
Published: (2025)

Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
by: Jin, Heegon, et al.
Published: (2024)

ALISON: Fast and Effective Stylometric Authorship Obfuscation
by: Xing, Eric, et al.
Published: (2024)

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)

Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)

Softmax Linear Attention: Reclaiming Global Competition
by: Xu, Mingwei, et al.
Published: (2026)

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
by: Anshumann, et al.
Published: (2025)

Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation
by: Shafique, Muhammad Ali, et al.
Published: (2025)

Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions
by: Yun, Taedong, et al.
Published: (2025)

DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
by: Wang, Ziyi, et al.
Published: (2026)

Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025)

SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
by: Cha, Jungyoub, et al.
Published: (2025)

ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering
by: Ghosh, Shubhra, et al.
Published: (2025)

Dealing with Annotator Disagreement in Hate Speech Classification
by: Dehghan, Somaiyeh, et al.
Published: (2025)

Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
by: Hong, Chunsan, et al.
Published: (2025)

On Explaining with Attention Matrices
by: Naim, Omar, et al.
Published: (2024)

A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models
by: Dhole, Kaustubh D.
Published: (2025)

AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese
by: Simplício, Afonso, et al.
Published: (2026)