Saved in:
| Main Authors: | Rom, Aviad, Bar, Kfir |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.16065 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025)
by: Himelstein, Rom, et al.
Published: (2025)
Training Bilingual LMs with Data Constraints in the Targeted Language
by: Seto, Skyler, et al.
Published: (2024)
by: Seto, Skyler, et al.
Published: (2024)
Leveraging NTPs for Efficient Hallucination Detection in VLMs
by: Azachi, Ofir, et al.
Published: (2025)
by: Azachi, Ofir, et al.
Published: (2025)
Kanana: Compute-efficient Bilingual Language Models
by: Kanana LLM Team, et al.
Published: (2025)
by: Kanana LLM Team, et al.
Published: (2025)
Efficient Training of Language Models with Compact and Consistent Next Token Distributions
by: Sathe, Ashutosh, et al.
Published: (2024)
by: Sathe, Ashutosh, et al.
Published: (2024)
How a Bilingual LM Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders
by: Inaba, Tatsuro, et al.
Published: (2025)
by: Inaba, Tatsuro, et al.
Published: (2025)
CroissantLLM: A Truly Bilingual French-English Language Model
by: Faysse, Manuel, et al.
Published: (2024)
by: Faysse, Manuel, et al.
Published: (2024)
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
by: Tran, Toan, et al.
Published: (2025)
by: Tran, Toan, et al.
Published: (2025)
Rethinking Token Reduction for State Space Models
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
by: Chen, Runjin, et al.
Published: (2025)
by: Chen, Runjin, et al.
Published: (2025)
Improving Diffusion Language Model Decoding through Joint Search in Generation Order and Token Space
by: Shen, Yangyi, et al.
Published: (2026)
by: Shen, Yangyi, et al.
Published: (2026)
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
by: Slagle, Kevin
Published: (2024)
by: Slagle, Kevin
Published: (2024)
Training Large Language Models To Reason In Parallel With Global Forking Tokens
by: Jia, Sheng, et al.
Published: (2025)
by: Jia, Sheng, et al.
Published: (2025)
Think before you speak: Training Language Models With Pause Tokens
by: Goyal, Sachin, et al.
Published: (2023)
by: Goyal, Sachin, et al.
Published: (2023)
CharED: Character-wise Ensemble Decoding for Large Language Models
by: Gu, Kevin, et al.
Published: (2024)
by: Gu, Kevin, et al.
Published: (2024)
On Bilingual Lexicon Induction with Large Language Models
by: Li, Yaoyiran, et al.
Published: (2023)
by: Li, Yaoyiran, et al.
Published: (2023)
Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming
by: Zhang, Demi, et al.
Published: (2024)
by: Zhang, Demi, et al.
Published: (2024)
Mapping Post-Training Forgetting in Language Models at Scale
by: Harmon, Jackson, et al.
Published: (2025)
by: Harmon, Jackson, et al.
Published: (2025)
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
by: Chai, Di, et al.
Published: (2025)
by: Chai, Di, et al.
Published: (2025)
Language Modeling with Learned Meta-Tokens
by: Shah, Alok N., et al.
Published: (2025)
by: Shah, Alok N., et al.
Published: (2025)
Parallel Token Prediction for Language Models
by: Draxler, Felix, et al.
Published: (2025)
by: Draxler, Felix, et al.
Published: (2025)
Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation
by: Heo, DongNyeong, et al.
Published: (2023)
by: Heo, DongNyeong, et al.
Published: (2023)
The Impact of Language Mixing on Bilingual LLM Reasoning
by: Li, Yihao, et al.
Published: (2025)
by: Li, Yihao, et al.
Published: (2025)
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
by: Shao, Chenze, et al.
Published: (2024)
by: Shao, Chenze, et al.
Published: (2024)
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
by: Wang, Jingcun, et al.
Published: (2024)
by: Wang, Jingcun, et al.
Published: (2024)
Revisiting Character-level Adversarial Attacks for Language Models
by: Rocamora, Elias Abad, et al.
Published: (2024)
by: Rocamora, Elias Abad, et al.
Published: (2024)
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction
by: Ruder, Sebastian, et al.
Published: (2018)
by: Ruder, Sebastian, et al.
Published: (2018)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
by: Zhang, Ge, et al.
Published: (2024)
by: Zhang, Ge, et al.
Published: (2024)
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models
by: Bai, Yang, et al.
Published: (2024)
by: Bai, Yang, et al.
Published: (2024)
Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models
by: Zeng, Linda, et al.
Published: (2026)
by: Zeng, Linda, et al.
Published: (2026)
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
by: Riabi, Arij, et al.
Published: (2021)
by: Riabi, Arij, et al.
Published: (2021)
Reasoning Bias of Next Token Prediction Training
by: Lin, Pengxiao, et al.
Published: (2025)
by: Lin, Pengxiao, et al.
Published: (2025)
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
by: Deng, Chunyuan, et al.
Published: (2026)
by: Deng, Chunyuan, et al.
Published: (2026)
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
by: Zhu, Mingcheng, et al.
Published: (2026)
by: Zhu, Mingcheng, et al.
Published: (2026)
Shared Global and Local Geometry of Language Model Embeddings
by: Lee, Andrew, et al.
Published: (2025)
by: Lee, Andrew, et al.
Published: (2025)
The Geometry of Tokens in Internal Representations of Large Language Models
by: Viswanathan, Karthik, et al.
Published: (2025)
by: Viswanathan, Karthik, et al.
Published: (2025)
Silenced Biases: The Dark Side LLMs Learned to Refuse
by: Himelstein, Rom, et al.
Published: (2025)
by: Himelstein, Rom, et al.
Published: (2025)
Smart Bilingual Focused Crawling of Parallel Documents
by: García-Romero, Cristian, et al.
Published: (2024)
by: García-Romero, Cristian, et al.
Published: (2024)
Similar Items
-
Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025) -
Training Bilingual LMs with Data Constraints in the Targeted Language
by: Seto, Skyler, et al.
Published: (2024) -
Leveraging NTPs for Efficient Hallucination Detection in VLMs
by: Azachi, Ofir, et al.
Published: (2025) -
Kanana: Compute-efficient Bilingual Language Models
by: Kanana LLM Team, et al.
Published: (2025) -
Efficient Training of Language Models with Compact and Consistent Next Token Distributions
by: Sathe, Ashutosh, et al.
Published: (2024)