:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fokin, Danil, Płużyczka, Monika, Golovin, Grigory
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2507.19869
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gold Panning in Vocabulary: An Adaptive Method for Vocabulary Expansion of Domain-Specific LLMs
by: Liu, Chengyuan, et al.
Published: (2024)

ISSR: Iterative Selection with Self-Review for Vocabulary Test Distractor Generation
by: Liu, Yu-Cheng, et al.
Published: (2025)

Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
by: Martínez, Gonzalo, et al.
Published: (2023)

Large Vocabulary Size Improves Large Language Models
by: Takase, Sho, et al.
Published: (2024)

Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
by: Alrefaie, Mohamed Taher, et al.
Published: (2024)

A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)

Generation with Dynamic Vocabulary
by: Liu, Yanting, et al.
Published: (2024)

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
by: Tao, Chaofan, et al.
Published: (2024)

Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size
by: Hayou, Soufiane, et al.
Published: (2025)

Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
by: Shin, Haebin, et al.
Published: (2025)

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
by: Balde, Gunjan, et al.
Published: (2024)

Speculative Decoding with a Speculative Vocabulary
by: Williams, Miles, et al.
Published: (2026)

EVOKE: Emotion Vocabulary Of Korean and English
by: Jung, Yoonwon, et al.
Published: (2026)

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
by: Kando, Shunsuke, et al.
Published: (2025)

Token-level Ensembling of Models with Different Vocabularies
by: Wicks, Rachel, et al.
Published: (2025)

Overcoming Vocabulary Constraints with Pixel-level Fallback
by: Lotz, Jonas F., et al.
Published: (2025)

Disentangling MLP Neuron Weights in Vocabulary Space
by: Avrahamy, Asaf, et al.
Published: (2026)

DVAGen: Dynamic Vocabulary Augmented Generation
by: Du, Wei, et al.
Published: (2025)

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation
by: De Silva, Ulindu, et al.
Published: (2025)

Open Vocabulary Panoptic Segmentation With Retrieval Augmentation
by: Sadeq, Nafis, et al.
Published: (2026)

Open-Vocabulary Federated Learning with Multimodal Prototyping
by: Zeng, Huimin, et al.
Published: (2024)

Prune or Retrain: Optimizing the Vocabulary of Multilingual Models for Estonian
by: Dorkin, Aleksei, et al.
Published: (2025)

Vocabulary Customization for Efficient Domain-Specific LLM Deployment
by: Herold, Christian, et al.
Published: (2025)

Autoencoder-Based Framework to Capture Vocabulary Quality in NLP
by: Dang, Vu Minh Hoang, et al.
Published: (2025)

Bridging the Gap between Different Vocabularies for LLM Ensemble
by: Xu, Yangyifan, et al.
Published: (2024)

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
by: Cognetta, Marco, et al.
Published: (2024)

Scaling LLM Pre-training with Vocabulary Curriculum
by: Yu, Fangyuan
Published: (2025)

Out-of-Vocabulary Sampling Boosts Speculative Decoding
by: Timor, Nadav, et al.
Published: (2025)

The Role of Vocabularies in Learning Sparse Representations for Ranking
by: Kim, Hiun, et al.
Published: (2025)

Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
by: Schreiter, Dimitri
Published: (2025)

TokAlign: Efficient Vocabulary Adaptation via Token Alignment
by: Li, Chong, et al.
Published: (2025)

Vocab Diet: Reshaping the Vocabulary of LLMs via Vector Arithmetic
by: Reif, Yuval, et al.
Published: (2025)

Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer
by: Kautsar, Muhammad Dehan Al, et al.
Published: (2025)

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning
by: Kim, Nayeon, et al.
Published: (2025)

The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation
by: Itkonen, Oona, et al.
Published: (2026)

Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
by: Zouhar, Vilém
Published: (2024)

Vocabulary-level Memory Efficiency for Language Model Fine-tuning
by: Williams, Miles, et al.
Published: (2023)

Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion
by: Mehta, Maitrey, et al.
Published: (2026)

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning
by: Wu, Tao, et al.
Published: (2026)

Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
by: Han, HyoJung, et al.
Published: (2024)