:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Yu, Shen, Sheng, Munos, Rémi, Zhan, Hongyuan, Tian, Yuandong
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.12635
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)

Efficient Streaming Language Models with Attention Sinks
by: Xiao, Guangxuan, et al.
Published: (2023)

Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)

You Only Use Reactive Attention Slice For Long Context Retrieval
by: Soh, Yun Joon, et al.
Published: (2024)

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)

Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)

TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
by: Zeng, Runjia, et al.
Published: (2026)

Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization
by: Foroutan, Negar, et al.
Published: (2025)

Contextual Position Encoding: Learning to Count What's Important
by: Golovneva, Olga, et al.
Published: (2024)

AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG
by: Peng, Chao, et al.
Published: (2025)

SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Modeling
by: Liu, Dong, et al.
Published: (2025)

GRACE: Generative Recommendation via Journey-Aware Sparse Attention on Chain-of-Thought Tokenization
by: Ma, Luyi, et al.
Published: (2025)

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
by: Zhou, Yang, et al.
Published: (2025)

RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
by: Yang, Kevin, et al.
Published: (2023)

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning
by: Lu, Yiyang, et al.
Published: (2026)

Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)

Group Representational Position Encoding
by: Zhang, Yifan, et al.
Published: (2025)

Understanding Token Probability Encoding in Output Embeddings
by: Cho, Hakaze, et al.
Published: (2024)

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
by: Chen, Yuhan, et al.
Published: (2024)

Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
by: Yun, Jungmin, et al.
Published: (2024)

Re-Initialization Token Learning for Tool-Augmented Large Language Models
by: Li, Chenghao, et al.
Published: (2025)

Token Alignment via Character Matching for Subword Completion
by: Athiwaratkun, Ben, et al.
Published: (2024)

Do LLMs Encode Functional Importance of Reasoning Tokens?
by: Singh, Janvijay, et al.
Published: (2026)

Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)

Attention Basin: Why Contextual Position Matters in Large Language Models
by: Yi, Zihao, et al.
Published: (2025)

Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
by: Deniz, Omer Faruk, et al.
Published: (2026)

Many-Tier Instruction Hierarchy in LLM Agents
by: Zhang, Jingyu, et al.
Published: (2026)

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens
by: Liu, Shiqi, et al.
Published: (2026)

Token Homogenization under Positional Bias
by: Yusupov, Viacheslav, et al.
Published: (2025)

Empowering Character-level Text Infilling by Eliminating Sub-Tokens
by: Ren, Houxing, et al.
Published: (2024)

Mitigating Coordinate Prediction Bias from Positional Encoding Failures
by: Tao, Xingjian, et al.
Published: (2025)

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably
by: Du, Yufeng, et al.
Published: (2026)

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness
by: Wang, Zekun, et al.
Published: (2024)

Token-Budget-Aware LLM Reasoning
by: Han, Tingxu, et al.
Published: (2024)

Technical Report: Impact of Position Bias on Language Models in Token Classification
by: Amor, Mehdi Ben, et al.
Published: (2023)

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
by: Chen, Yu, et al.
Published: (2026)

Training Text-to-Molecule Models with Context-Aware Tokenization
by: Kim, Seojin, et al.
Published: (2025)

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)