Saved in:
| Main Authors: | Peng, Bowen, Gigant, Théo, Quesnelle, Jeffrey |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.06546 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
by: Gigant, Théo, et al.
Published: (2026)
by: Gigant, Théo, et al.
Published: (2026)
Long Context Pre-Training with Lighthouse Attention
by: Peng, Bowen, et al.
Published: (2026)
by: Peng, Bowen, et al.
Published: (2026)
YaRN: Efficient Context Window Extension of Large Language Models
by: Peng, Bowen, et al.
Published: (2023)
by: Peng, Bowen, et al.
Published: (2023)
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
by: Gigant, Théo, et al.
Published: (2025)
by: Gigant, Théo, et al.
Published: (2025)
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
by: Gigant, Théo, et al.
Published: (2024)
by: Gigant, Théo, et al.
Published: (2024)
Hermes 3 Technical Report
by: Teknium, Ryan, et al.
Published: (2024)
by: Teknium, Ryan, et al.
Published: (2024)
Distilling Token-Trained Models into Byte-Level Models
by: Bao, Zishuo, et al.
Published: (2026)
by: Bao, Zishuo, et al.
Published: (2026)
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
by: Chen, Yangyi, et al.
Published: (2025)
by: Chen, Yangyi, et al.
Published: (2025)
Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pre-trained Models
by: Purason, Taido, et al.
Published: (2025)
by: Purason, Taido, et al.
Published: (2025)
Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs
by: Lee, Jaeseong, et al.
Published: (2025)
by: Lee, Jaeseong, et al.
Published: (2025)
Pre-Training Curriculum for Multi-Token Prediction in Language Models
by: Aynetdinov, Ansar, et al.
Published: (2025)
by: Aynetdinov, Ansar, et al.
Published: (2025)
Synthetic Pre-Pre-Training Improves Language Model Robustness to Noisy Pre-Training Data
by: Guo, Xu, et al.
Published: (2026)
by: Guo, Xu, et al.
Published: (2026)
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
by: Lu, Keming, et al.
Published: (2024)
by: Lu, Keming, et al.
Published: (2024)
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
by: Yuan, Chenhan, et al.
Published: (2024)
by: Yuan, Chenhan, et al.
Published: (2024)
From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs
by: Nishu, Kumari, et al.
Published: (2025)
by: Nishu, Kumari, et al.
Published: (2025)
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
by: Wang, Yubo, et al.
Published: (2026)
by: Wang, Yubo, et al.
Published: (2026)
SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings
by: SadraeiJavaeri, MohammadAli, et al.
Published: (2024)
by: SadraeiJavaeri, MohammadAli, et al.
Published: (2024)
LLM Latent Reasoning as Chain of Superposition
by: Deng, Jingcheng, et al.
Published: (2025)
by: Deng, Jingcheng, et al.
Published: (2025)
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
by: Chizhov, Pavel, et al.
Published: (2024)
by: Chizhov, Pavel, et al.
Published: (2024)
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement
by: Yu, Le, et al.
Published: (2024)
by: Yu, Le, et al.
Published: (2024)
Medical Vision-Language Pre-Training for Brain Abnormalities
by: Monajatipoor, Masoud, et al.
Published: (2024)
by: Monajatipoor, Masoud, et al.
Published: (2024)
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model
by: Ding, Bowen, et al.
Published: (2025)
by: Ding, Bowen, et al.
Published: (2025)
Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing
by: Goel, Raghavv, et al.
Published: (2026)
by: Goel, Raghavv, et al.
Published: (2026)
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
by: Li, Chen, et al.
Published: (2025)
by: Li, Chen, et al.
Published: (2025)
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
by: Han, Minglun, et al.
Published: (2024)
by: Han, Minglun, et al.
Published: (2024)
Reinforcement Pre-Training
by: Dong, Qingxiu, et al.
Published: (2025)
by: Dong, Qingxiu, et al.
Published: (2025)
Entropy-Driven Pre-Tokenization for Byte-Pair Encoding
by: Hu, Yifan, et al.
Published: (2025)
by: Hu, Yifan, et al.
Published: (2025)
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
by: Zhao, Bowen, et al.
Published: (2024)
by: Zhao, Bowen, et al.
Published: (2024)
On the Limits of Token Reduction for Efficient Unified Vision Language Training
by: Chen, Siyi, et al.
Published: (2026)
by: Chen, Siyi, et al.
Published: (2026)
Efficient Training of Language Models with Compact and Consistent Next Token Distributions
by: Sathe, Ashutosh, et al.
Published: (2024)
by: Sathe, Ashutosh, et al.
Published: (2024)
Token Masking Improves Transformer-Based Text Classification
by: Xu, Xianglong, et al.
Published: (2025)
by: Xu, Xianglong, et al.
Published: (2025)
NITP: Next Implicit Token Prediction for LLM Pre-training
by: Zhang, Xiangdong, et al.
Published: (2026)
by: Zhang, Xiangdong, et al.
Published: (2026)
Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework
by: Zhang, Chenyuan, et al.
Published: (2026)
by: Zhang, Chenyuan, et al.
Published: (2026)
Detecting Concrete Visual Tokens for Multimodal Machine Translation
by: Bowen, Braeden, et al.
Published: (2024)
by: Bowen, Braeden, et al.
Published: (2024)
A General and Efficient Training for Transformer via Token Expansion
by: Huang, Wenxuan, et al.
Published: (2024)
by: Huang, Wenxuan, et al.
Published: (2024)
Superposition Yields Robust Neural Scaling
by: Liu, Yizhou, et al.
Published: (2025)
by: Liu, Yizhou, et al.
Published: (2025)
Out of the Memory Barrier: A Highly Memory Efficient Training System for LLMs with Million-Token Contexts
by: Li, Wenhao, et al.
Published: (2026)
by: Li, Wenhao, et al.
Published: (2026)
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
by: Li, Guihong, et al.
Published: (2025)
by: Li, Guihong, et al.
Published: (2025)
Efficient Switchable Safety Control in LLMs via Magic-Token-Guided Co-Training
by: Si, Jianfeng, et al.
Published: (2025)
by: Si, Jianfeng, et al.
Published: (2025)
Few-shot Named Entity Recognition via Superposition Concept Discrimination
by: Chen, Jiawei, et al.
Published: (2024)
by: Chen, Jiawei, et al.
Published: (2024)
Similar Items
-
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
by: Gigant, Théo, et al.
Published: (2026) -
Long Context Pre-Training with Lighthouse Attention
by: Peng, Bowen, et al.
Published: (2026) -
YaRN: Efficient Context Window Extension of Large Language Models
by: Peng, Bowen, et al.
Published: (2023) -
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
by: Gigant, Théo, et al.
Published: (2025) -
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
by: Gigant, Théo, et al.
Published: (2024)