:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sawada, Tomohiro, Goyal, Kartik
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2508.06621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities
by: Yan, Yunxiang, et al.
Published: (2025)

Batching BPE Tokenization Merges
by: Morgan, Alexander P.
Published: (2024)

LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers
by: Sun, Yike, et al.
Published: (2026)

BlockBPE: Parallel BPE Tokenization
by: You, Amos
Published: (2025)

SuperBPE: Space Travel for Language Models
by: Liu, Alisa, et al.
Published: (2025)

Morphological Typology in BPE Subword Productivity and Language Modeling
by: Parra, Iñigo
Published: (2024)

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
by: Hayase, Jonathan, et al.
Published: (2024)

BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
by: Chizhov, Pavel, et al.
Published: (2024)

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
by: Balde, Gunjan, et al.
Published: (2024)

Exploring Forgetting in Large Language Model Pre-Training
by: Liao, Chonghua, et al.
Published: (2024)

Constructing a BPE Tokenization DFA
by: Berglund, Martin, et al.
Published: (2024)

Bit-level BPE: Below the byte boundary
by: Moon, Sangwhan, et al.
Published: (2025)

Byte BPE Tokenization as an Inverse string Homomorphism
by: Geng, Saibo, et al.
Published: (2024)

Understanding Catastrophic Forgetting in Language Models via Implicit Inference
by: Kotha, Suhas, et al.
Published: (2023)

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
by: Srinivasan, Tejas, et al.
Published: (2024)

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
by: Lyu, Mengxian, et al.
Published: (2026)

From Characters to Tokens: Dynamic Grouping with Hierarchical BPE
by: Dolga, Rares, et al.
Published: (2025)

AdaptBPE: From General Purpose to Specialized Tokenizers
by: Liyanage, Vijini, et al.
Published: (2026)

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
by: Cognetta, Marco, et al.
Published: (2024)

Pretraining Language Models with Subword Regularization: An Empirical Study of BPE Dropout in Low-Resource NLP
by: Visser, Ruan, et al.
Published: (2026)

Mapping Post-Training Forgetting in Language Models at Scale
by: Harmon, Jackson, et al.
Published: (2025)

Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
by: Lian, Haoran, et al.
Published: (2024)

MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy
by: Yoshida, Davis, et al.
Published: (2023)

BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization
by: Land, Sander, et al.
Published: (2025)

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
by: Chang, Tyler A., et al.
Published: (2023)

ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language
by: Lidayan, Aly, et al.
Published: (2025)

Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
by: Yu, Zewei, et al.
Published: (2026)

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
by: Watts, Ishaan, et al.
Published: (2026)

On the Limits of Model Merging for Multilinguality in Pre-Training
by: Aycock, Seth, et al.
Published: (2026)

MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
by: Asgari, Ehsaneddin, et al.
Published: (2025)

Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
by: Yu, Zeping, et al.
Published: (2025)

Evaluating Subword Tokenization Techniques for Bengali: A Benchmark Study with BengaliBPE
by: Patwary, Firoj Ahmmed, et al.
Published: (2025)

GPUTOK: GPU Accelerated Byte Level BPE Tokenization
by: Kadamba, Venu Gopal, et al.
Published: (2026)

1bit-Merging: Dynamic Quantized Merging for Large Language Models
by: Liu, Shuqi, et al.
Published: (2025)

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning
by: Gupta, Prakhar, et al.
Published: (2026)

SeMe: Training-Free Language Model Merging via Semantic Alignment
by: Gu, Jian, et al.
Published: (2025)

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
by: Na, Clara, et al.
Published: (2024)

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
by: Takashiro, Shota, et al.
Published: (2024)

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
by: Morrison, Jacob, et al.
Published: (2024)

Transport and Merge: Cross-Architecture Merging for Large Language Models
by: Cui, Chenhang, et al.
Published: (2026)