Saved in:
| Main Authors: | Wang, Yu, Shen, Sheng, Munos, Rémi, Zhan, Hongyuan, Tian, Yuandong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.12635 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)
by: Tian, Yuandong, et al.
Published: (2023)
Efficient Streaming Language Models with Attention Sinks
by: Xiao, Guangxuan, et al.
Published: (2023)
by: Xiao, Guangxuan, et al.
Published: (2023)
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)
by: Tian, Yuandong
Published: (2024)
You Only Use Reactive Attention Slice For Long Context Retrieval
by: Soh, Yun Joon, et al.
Published: (2024)
by: Soh, Yun Joon, et al.
Published: (2024)
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)
by: Su, DiJia, et al.
Published: (2025)
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)
by: Cao, Sheng, et al.
Published: (2025)
TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
by: Zeng, Runjia, et al.
Published: (2026)
by: Zeng, Runjia, et al.
Published: (2026)
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization
by: Foroutan, Negar, et al.
Published: (2025)
by: Foroutan, Negar, et al.
Published: (2025)
Contextual Position Encoding: Learning to Count What's Important
by: Golovneva, Olga, et al.
Published: (2024)
by: Golovneva, Olga, et al.
Published: (2024)
AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG
by: Peng, Chao, et al.
Published: (2025)
by: Peng, Chao, et al.
Published: (2025)
SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Modeling
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
GRACE: Generative Recommendation via Journey-Aware Sparse Attention on Chain-of-Thought Tokenization
by: Ma, Luyi, et al.
Published: (2025)
by: Ma, Luyi, et al.
Published: (2025)
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
by: Yang, Kevin, et al.
Published: (2023)
by: Yang, Kevin, et al.
Published: (2023)
MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning
by: Lu, Yiyang, et al.
Published: (2026)
by: Lu, Yiyang, et al.
Published: (2026)
Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)
by: Wang, Xinghao, et al.
Published: (2025)
Group Representational Position Encoding
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Understanding Token Probability Encoding in Output Embeddings
by: Cho, Hakaze, et al.
Published: (2024)
by: Cho, Hakaze, et al.
Published: (2024)
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
by: Chen, Yuhan, et al.
Published: (2024)
by: Chen, Yuhan, et al.
Published: (2024)
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
by: Yun, Jungmin, et al.
Published: (2024)
by: Yun, Jungmin, et al.
Published: (2024)
Re-Initialization Token Learning for Tool-Augmented Large Language Models
by: Li, Chenghao, et al.
Published: (2025)
by: Li, Chenghao, et al.
Published: (2025)
Token Alignment via Character Matching for Subword Completion
by: Athiwaratkun, Ben, et al.
Published: (2024)
by: Athiwaratkun, Ben, et al.
Published: (2024)
Do LLMs Encode Functional Importance of Reasoning Tokens?
by: Singh, Janvijay, et al.
Published: (2026)
by: Singh, Janvijay, et al.
Published: (2026)
Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
Attention Basin: Why Contextual Position Matters in Large Language Models
by: Yi, Zihao, et al.
Published: (2025)
by: Yi, Zihao, et al.
Published: (2025)
Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
by: Deniz, Omer Faruk, et al.
Published: (2026)
by: Deniz, Omer Faruk, et al.
Published: (2026)
Many-Tier Instruction Hierarchy in LLM Agents
by: Zhang, Jingyu, et al.
Published: (2026)
by: Zhang, Jingyu, et al.
Published: (2026)
STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens
by: Liu, Shiqi, et al.
Published: (2026)
by: Liu, Shiqi, et al.
Published: (2026)
Token Homogenization under Positional Bias
by: Yusupov, Viacheslav, et al.
Published: (2025)
by: Yusupov, Viacheslav, et al.
Published: (2025)
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
by: Ren, Houxing, et al.
Published: (2024)
by: Ren, Houxing, et al.
Published: (2024)
Mitigating Coordinate Prediction Bias from Positional Encoding Failures
by: Tao, Xingjian, et al.
Published: (2025)
by: Tao, Xingjian, et al.
Published: (2025)
ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)
by: Tian, Yuxing, et al.
Published: (2026)
RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably
by: Du, Yufeng, et al.
Published: (2026)
by: Du, Yufeng, et al.
Published: (2026)
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness
by: Wang, Zekun, et al.
Published: (2024)
by: Wang, Zekun, et al.
Published: (2024)
Token-Budget-Aware LLM Reasoning
by: Han, Tingxu, et al.
Published: (2024)
by: Han, Tingxu, et al.
Published: (2024)
Technical Report: Impact of Position Bias on Language Models in Token Classification
by: Amor, Mehdi Ben, et al.
Published: (2023)
by: Amor, Mehdi Ben, et al.
Published: (2023)
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)
by: Gao, Yizhao, et al.
Published: (2026)
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
by: Chen, Yu, et al.
Published: (2026)
by: Chen, Yu, et al.
Published: (2026)
Training Text-to-Molecule Models with Context-Aware Tokenization
by: Kim, Seojin, et al.
Published: (2025)
by: Kim, Seojin, et al.
Published: (2025)
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)
by: Zhao, Changsheng, et al.
Published: (2025)
Similar Items
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023) -
Efficient Streaming Language Models with Attention Sinks
by: Xiao, Guangxuan, et al.
Published: (2023) -
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024) -
You Only Use Reactive Attention Slice For Long Context Retrieval
by: Soh, Yun Joon, et al.
Published: (2024) -
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)