Saved in:
| Main Authors: | Sawada, Tomohiro, Goyal, Kartik |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.06621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities
by: Yan, Yunxiang, et al.
Published: (2025)
by: Yan, Yunxiang, et al.
Published: (2025)
Batching BPE Tokenization Merges
by: Morgan, Alexander P.
Published: (2024)
by: Morgan, Alexander P.
Published: (2024)
LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers
by: Sun, Yike, et al.
Published: (2026)
by: Sun, Yike, et al.
Published: (2026)
BlockBPE: Parallel BPE Tokenization
by: You, Amos
Published: (2025)
by: You, Amos
Published: (2025)
SuperBPE: Space Travel for Language Models
by: Liu, Alisa, et al.
Published: (2025)
by: Liu, Alisa, et al.
Published: (2025)
Morphological Typology in BPE Subword Productivity and Language Modeling
by: Parra, Iñigo
Published: (2024)
by: Parra, Iñigo
Published: (2024)
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
by: Hayase, Jonathan, et al.
Published: (2024)
by: Hayase, Jonathan, et al.
Published: (2024)
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
by: Chizhov, Pavel, et al.
Published: (2024)
by: Chizhov, Pavel, et al.
Published: (2024)
Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
by: Balde, Gunjan, et al.
Published: (2024)
by: Balde, Gunjan, et al.
Published: (2024)
Exploring Forgetting in Large Language Model Pre-Training
by: Liao, Chonghua, et al.
Published: (2024)
by: Liao, Chonghua, et al.
Published: (2024)
Constructing a BPE Tokenization DFA
by: Berglund, Martin, et al.
Published: (2024)
by: Berglund, Martin, et al.
Published: (2024)
Bit-level BPE: Below the byte boundary
by: Moon, Sangwhan, et al.
Published: (2025)
by: Moon, Sangwhan, et al.
Published: (2025)
Byte BPE Tokenization as an Inverse string Homomorphism
by: Geng, Saibo, et al.
Published: (2024)
by: Geng, Saibo, et al.
Published: (2024)
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
by: Kotha, Suhas, et al.
Published: (2023)
by: Kotha, Suhas, et al.
Published: (2023)
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
by: Srinivasan, Tejas, et al.
Published: (2024)
by: Srinivasan, Tejas, et al.
Published: (2024)
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
by: Lyu, Mengxian, et al.
Published: (2026)
by: Lyu, Mengxian, et al.
Published: (2026)
From Characters to Tokens: Dynamic Grouping with Hierarchical BPE
by: Dolga, Rares, et al.
Published: (2025)
by: Dolga, Rares, et al.
Published: (2025)
AdaptBPE: From General Purpose to Specialized Tokenizers
by: Liyanage, Vijini, et al.
Published: (2026)
by: Liyanage, Vijini, et al.
Published: (2026)
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
by: Cognetta, Marco, et al.
Published: (2024)
by: Cognetta, Marco, et al.
Published: (2024)
Pretraining Language Models with Subword Regularization: An Empirical Study of BPE Dropout in Low-Resource NLP
by: Visser, Ruan, et al.
Published: (2026)
by: Visser, Ruan, et al.
Published: (2026)
Mapping Post-Training Forgetting in Language Models at Scale
by: Harmon, Jackson, et al.
Published: (2025)
by: Harmon, Jackson, et al.
Published: (2025)
Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
by: Lian, Haoran, et al.
Published: (2024)
by: Lian, Haoran, et al.
Published: (2024)
MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy
by: Yoshida, Davis, et al.
Published: (2023)
by: Yoshida, Davis, et al.
Published: (2023)
BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization
by: Land, Sander, et al.
Published: (2025)
by: Land, Sander, et al.
Published: (2025)
Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
by: Chang, Tyler A., et al.
Published: (2023)
by: Chang, Tyler A., et al.
Published: (2023)
ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language
by: Lidayan, Aly, et al.
Published: (2025)
by: Lidayan, Aly, et al.
Published: (2025)
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
by: Yu, Zewei, et al.
Published: (2026)
by: Yu, Zewei, et al.
Published: (2026)
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
by: Watts, Ishaan, et al.
Published: (2026)
by: Watts, Ishaan, et al.
Published: (2026)
On the Limits of Model Merging for Multilinguality in Pre-Training
by: Aycock, Seth, et al.
Published: (2026)
by: Aycock, Seth, et al.
Published: (2026)
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
by: Asgari, Ehsaneddin, et al.
Published: (2025)
by: Asgari, Ehsaneddin, et al.
Published: (2025)
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
by: Yu, Zeping, et al.
Published: (2025)
by: Yu, Zeping, et al.
Published: (2025)
Evaluating Subword Tokenization Techniques for Bengali: A Benchmark Study with BengaliBPE
by: Patwary, Firoj Ahmmed, et al.
Published: (2025)
by: Patwary, Firoj Ahmmed, et al.
Published: (2025)
GPUTOK: GPU Accelerated Byte Level BPE Tokenization
by: Kadamba, Venu Gopal, et al.
Published: (2026)
by: Kadamba, Venu Gopal, et al.
Published: (2026)
1bit-Merging: Dynamic Quantized Merging for Large Language Models
by: Liu, Shuqi, et al.
Published: (2025)
by: Liu, Shuqi, et al.
Published: (2025)
Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning
by: Gupta, Prakhar, et al.
Published: (2026)
by: Gupta, Prakhar, et al.
Published: (2026)
SeMe: Training-Free Language Model Merging via Semantic Alignment
by: Gu, Jian, et al.
Published: (2025)
by: Gu, Jian, et al.
Published: (2025)
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
by: Na, Clara, et al.
Published: (2024)
by: Na, Clara, et al.
Published: (2024)
Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
by: Takashiro, Shota, et al.
Published: (2024)
by: Takashiro, Shota, et al.
Published: (2024)
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
by: Morrison, Jacob, et al.
Published: (2024)
by: Morrison, Jacob, et al.
Published: (2024)
Transport and Merge: Cross-Architecture Merging for Large Language Models
by: Cui, Chenhang, et al.
Published: (2026)
by: Cui, Chenhang, et al.
Published: (2026)
Similar Items
-
Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities
by: Yan, Yunxiang, et al.
Published: (2025) -
Batching BPE Tokenization Merges
by: Morgan, Alexander P.
Published: (2024) -
LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers
by: Sun, Yike, et al.
Published: (2026) -
BlockBPE: Parallel BPE Tokenization
by: You, Amos
Published: (2025) -
SuperBPE: Space Travel for Language Models
by: Liu, Alisa, et al.
Published: (2025)