Saved in:
| Main Authors: | Fokin, Danil, Płużyczka, Monika, Golovin, Grigory |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.19869 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Gold Panning in Vocabulary: An Adaptive Method for Vocabulary Expansion of Domain-Specific LLMs
by: Liu, Chengyuan, et al.
Published: (2024)
by: Liu, Chengyuan, et al.
Published: (2024)
ISSR: Iterative Selection with Self-Review for Vocabulary Test Distractor Generation
by: Liu, Yu-Cheng, et al.
Published: (2025)
by: Liu, Yu-Cheng, et al.
Published: (2025)
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
by: Martínez, Gonzalo, et al.
Published: (2023)
by: Martínez, Gonzalo, et al.
Published: (2023)
Large Vocabulary Size Improves Large Language Models
by: Takase, Sho, et al.
Published: (2024)
by: Takase, Sho, et al.
Published: (2024)
Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
by: Alrefaie, Mohamed Taher, et al.
Published: (2024)
by: Alrefaie, Mohamed Taher, et al.
Published: (2024)
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)
by: Kopparapu, Sunil Kumar
Published: (2026)
Generation with Dynamic Vocabulary
by: Liu, Yanting, et al.
Published: (2024)
by: Liu, Yanting, et al.
Published: (2024)
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
by: Tao, Chaofan, et al.
Published: (2024)
by: Tao, Chaofan, et al.
Published: (2024)
Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size
by: Hayou, Soufiane, et al.
Published: (2025)
by: Hayou, Soufiane, et al.
Published: (2025)
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
by: Shin, Haebin, et al.
Published: (2025)
by: Shin, Haebin, et al.
Published: (2025)
Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
by: Balde, Gunjan, et al.
Published: (2024)
by: Balde, Gunjan, et al.
Published: (2024)
Speculative Decoding with a Speculative Vocabulary
by: Williams, Miles, et al.
Published: (2026)
by: Williams, Miles, et al.
Published: (2026)
EVOKE: Emotion Vocabulary Of Korean and English
by: Jung, Yoonwon, et al.
Published: (2026)
by: Jung, Yoonwon, et al.
Published: (2026)
Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
by: Kando, Shunsuke, et al.
Published: (2025)
by: Kando, Shunsuke, et al.
Published: (2025)
Token-level Ensembling of Models with Different Vocabularies
by: Wicks, Rachel, et al.
Published: (2025)
by: Wicks, Rachel, et al.
Published: (2025)
Overcoming Vocabulary Constraints with Pixel-level Fallback
by: Lotz, Jonas F., et al.
Published: (2025)
by: Lotz, Jonas F., et al.
Published: (2025)
Disentangling MLP Neuron Weights in Vocabulary Space
by: Avrahamy, Asaf, et al.
Published: (2026)
by: Avrahamy, Asaf, et al.
Published: (2026)
DVAGen: Dynamic Vocabulary Augmented Generation
by: Du, Wei, et al.
Published: (2025)
by: Du, Wei, et al.
Published: (2025)
Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation
by: De Silva, Ulindu, et al.
Published: (2025)
by: De Silva, Ulindu, et al.
Published: (2025)
Open Vocabulary Panoptic Segmentation With Retrieval Augmentation
by: Sadeq, Nafis, et al.
Published: (2026)
by: Sadeq, Nafis, et al.
Published: (2026)
Open-Vocabulary Federated Learning with Multimodal Prototyping
by: Zeng, Huimin, et al.
Published: (2024)
by: Zeng, Huimin, et al.
Published: (2024)
Prune or Retrain: Optimizing the Vocabulary of Multilingual Models for Estonian
by: Dorkin, Aleksei, et al.
Published: (2025)
by: Dorkin, Aleksei, et al.
Published: (2025)
Vocabulary Customization for Efficient Domain-Specific LLM Deployment
by: Herold, Christian, et al.
Published: (2025)
by: Herold, Christian, et al.
Published: (2025)
Autoencoder-Based Framework to Capture Vocabulary Quality in NLP
by: Dang, Vu Minh Hoang, et al.
Published: (2025)
by: Dang, Vu Minh Hoang, et al.
Published: (2025)
Bridging the Gap between Different Vocabularies for LLM Ensemble
by: Xu, Yangyifan, et al.
Published: (2024)
by: Xu, Yangyifan, et al.
Published: (2024)
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
by: Cognetta, Marco, et al.
Published: (2024)
by: Cognetta, Marco, et al.
Published: (2024)
Scaling LLM Pre-training with Vocabulary Curriculum
by: Yu, Fangyuan
Published: (2025)
by: Yu, Fangyuan
Published: (2025)
Out-of-Vocabulary Sampling Boosts Speculative Decoding
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
The Role of Vocabularies in Learning Sparse Representations for Ranking
by: Kim, Hiun, et al.
Published: (2025)
by: Kim, Hiun, et al.
Published: (2025)
Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
by: Schreiter, Dimitri
Published: (2025)
by: Schreiter, Dimitri
Published: (2025)
TokAlign: Efficient Vocabulary Adaptation via Token Alignment
by: Li, Chong, et al.
Published: (2025)
by: Li, Chong, et al.
Published: (2025)
Vocab Diet: Reshaping the Vocabulary of LLMs via Vector Arithmetic
by: Reif, Yuval, et al.
Published: (2025)
by: Reif, Yuval, et al.
Published: (2025)
Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer
by: Kautsar, Muhammad Dehan Al, et al.
Published: (2025)
by: Kautsar, Muhammad Dehan Al, et al.
Published: (2025)
Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning
by: Kim, Nayeon, et al.
Published: (2025)
by: Kim, Nayeon, et al.
Published: (2025)
The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation
by: Itkonen, Oona, et al.
Published: (2026)
by: Itkonen, Oona, et al.
Published: (2026)
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
by: Zouhar, Vilém
Published: (2024)
by: Zouhar, Vilém
Published: (2024)
Vocabulary-level Memory Efficiency for Language Model Fine-tuning
by: Williams, Miles, et al.
Published: (2023)
by: Williams, Miles, et al.
Published: (2023)
Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion
by: Mehta, Maitrey, et al.
Published: (2026)
by: Mehta, Maitrey, et al.
Published: (2026)
Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning
by: Wu, Tao, et al.
Published: (2026)
by: Wu, Tao, et al.
Published: (2026)
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
Similar Items
-
Gold Panning in Vocabulary: An Adaptive Method for Vocabulary Expansion of Domain-Specific LLMs
by: Liu, Chengyuan, et al.
Published: (2024) -
ISSR: Iterative Selection with Self-Review for Vocabulary Test Distractor Generation
by: Liu, Yu-Cheng, et al.
Published: (2025) -
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
by: Martínez, Gonzalo, et al.
Published: (2023) -
Large Vocabulary Size Improves Large Language Models
by: Takase, Sho, et al.
Published: (2024) -
Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
by: Alrefaie, Mohamed Taher, et al.
Published: (2024)