Saved in:
| Main Authors: | Bao, Zishuo, Leng, Jiaqi, Wang, Junxiong, Peng, Bowen, Lu, Yucheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01007 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
by: Leng, Jiaqi, et al.
Published: (2025)
by: Leng, Jiaqi, et al.
Published: (2025)
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
by: Gigant, Théo, et al.
Published: (2026)
by: Gigant, Théo, et al.
Published: (2026)
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
Cross-Tokenizer LLM Distillation through a Byte-Level Interface
by: Singh, Avyav Kumar, et al.
Published: (2026)
by: Singh, Avyav Kumar, et al.
Published: (2026)
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
by: Wei, Jingxuan, et al.
Published: (2024)
by: Wei, Jingxuan, et al.
Published: (2024)
Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers
by: Jang, Eugene, et al.
Published: (2024)
by: Jang, Eugene, et al.
Published: (2024)
Efficient Pre-Training with Token Superposition
by: Peng, Bowen, et al.
Published: (2026)
by: Peng, Bowen, et al.
Published: (2026)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)
by: Phan, Buu, et al.
Published: (2024)
Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions
by: Bao, Zishuo, et al.
Published: (2025)
by: Bao, Zishuo, et al.
Published: (2025)
BanglaByT5: Byte-Level Modelling for Bangla
by: Bhattacharyya, Pramit, et al.
Published: (2025)
by: Bhattacharyya, Pramit, et al.
Published: (2025)
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
by: Deng, Chunyuan, et al.
Published: (2026)
by: Deng, Chunyuan, et al.
Published: (2026)
GPUTOK: GPU Accelerated Byte Level BPE Tokenization
by: Kadamba, Venu Gopal, et al.
Published: (2026)
by: Kadamba, Venu Gopal, et al.
Published: (2026)
Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention
by: Fountzoulas, George
Published: (2026)
by: Fountzoulas, George
Published: (2026)
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
by: Zhang, Songming, et al.
Published: (2025)
by: Zhang, Songming, et al.
Published: (2025)
Kronecker Embeddings: Byte-Level Structured Token Representations for Parameter-Efficient Language Models
by: Shravan, Rohan
Published: (2026)
by: Shravan, Rohan
Published: (2026)
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
by: Cui, Xiao, et al.
Published: (2024)
by: Cui, Xiao, et al.
Published: (2024)
Rethinking Personalization in Large Language Models at the Token Level
by: Zhang, Chenheng, et al.
Published: (2026)
by: Zhang, Chenheng, et al.
Published: (2026)
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
by: Lin, Xingyu, et al.
Published: (2025)
by: Lin, Xingyu, et al.
Published: (2025)
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Sequence-Level Likelihood
by: Lin, Xingyu, et al.
Published: (2026)
by: Lin, Xingyu, et al.
Published: (2026)
Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
by: Lian, Haoran, et al.
Published: (2024)
by: Lian, Haoran, et al.
Published: (2024)
Byte BPE Tokenization as an Inverse string Homomorphism
by: Geng, Saibo, et al.
Published: (2024)
by: Geng, Saibo, et al.
Published: (2024)
CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters
by: Feng, Zishuo, et al.
Published: (2024)
by: Feng, Zishuo, et al.
Published: (2024)
Token-Level Uncertainty-Aware Objective for Language Model Post-Training
by: Liu, Tingkai, et al.
Published: (2025)
by: Liu, Tingkai, et al.
Published: (2025)
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
by: Slagle, Kevin
Published: (2024)
by: Slagle, Kevin
Published: (2024)
Entropy-Driven Pre-Tokenization for Byte-Pair Encoding
by: Hu, Yifan, et al.
Published: (2025)
by: Hu, Yifan, et al.
Published: (2025)
Back to Bytes: Revisiting Tokenization Through UTF-8
by: Moryossef, Amit, et al.
Published: (2025)
by: Moryossef, Amit, et al.
Published: (2025)
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
by: Yuan, Chenhan, et al.
Published: (2024)
by: Yuan, Chenhan, et al.
Published: (2024)
EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer
by: Zhang, Hao, et al.
Published: (2026)
by: Zhang, Hao, et al.
Published: (2026)
Hybrid Tokenization Strategy for DNA Language Model using Byte Pair Encoding and K-MER Methods
by: Sapkota, Ganesh, et al.
Published: (2025)
by: Sapkota, Ganesh, et al.
Published: (2025)
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
by: Kallini, Julie, et al.
Published: (2024)
by: Kallini, Julie, et al.
Published: (2024)
Byte Latent Transformer: Patches Scale Better Than Tokens
by: Pagnoni, Artidoro, et al.
Published: (2024)
by: Pagnoni, Artidoro, et al.
Published: (2024)
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
by: Zhang, Wanpeng, et al.
Published: (2024)
by: Zhang, Wanpeng, et al.
Published: (2024)
Beyond Tokens: Concept-Level Training Objectives for LLMs
by: Iyer, Laya, et al.
Published: (2026)
by: Iyer, Laya, et al.
Published: (2026)
Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models
by: Zhang, Xiang, et al.
Published: (2025)
by: Zhang, Xiang, et al.
Published: (2025)
Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data
by: Wang, Jiapeng, et al.
Published: (2025)
by: Wang, Jiapeng, et al.
Published: (2025)
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
by: Shao, Chenze, et al.
Published: (2024)
by: Shao, Chenze, et al.
Published: (2024)
Multimodal Latent Language Modeling with Next-Token Diffusion
by: Sun, Yutao, et al.
Published: (2024)
by: Sun, Yutao, et al.
Published: (2024)
Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition
by: Lu, Qiuhao, et al.
Published: (2024)
by: Lu, Qiuhao, et al.
Published: (2024)
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
by: Zhang, Zhen, et al.
Published: (2026)
by: Zhang, Zhen, et al.
Published: (2026)
Similar Items
-
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
by: Leng, Jiaqi, et al.
Published: (2025) -
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
by: Gigant, Théo, et al.
Published: (2026) -
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024) -
Cross-Tokenizer LLM Distillation through a Byte-Level Interface
by: Singh, Avyav Kumar, et al.
Published: (2026) -
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
by: Wei, Jingxuan, et al.
Published: (2024)