Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20009 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908904425783296 |
|---|---|
| author | Tamo, J. Ben Carlander-Reuterfelt, Daniel Rubin, Jonathan Hong, Dezhi Wang, Mingxian Poliannikov, Oleg |
| author_facet | Tamo, J. Ben Carlander-Reuterfelt, Daniel Rubin, Jonathan Hong, Dezhi Wang, Mingxian Poliannikov, Oleg |
| contents | Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final layers responsible for language control. On Qwen-3-32B and Bloom-7.1B, this method achieves over 98 percent language consistency across six languages while fine-tuning only 3-5 percent of parameters, without sacrificing task accuracy. Importantly, this result is nearly identical to that of full-scope fine-tuning (for example, above 98 percent language consistency for both methods across all prompt scenarios) but uses a fraction of the computational resources. To the best of our knowledge, this is the first approach to leverage layer-localization of language control for efficient multilingual adaptation. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_20009 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them? Tamo, J. Ben Carlander-Reuterfelt, Daniel Rubin, Jonathan Hong, Dezhi Wang, Mingxian Poliannikov, Oleg Computation and Language Artificial Intelligence Machine Learning Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final layers responsible for language control. On Qwen-3-32B and Bloom-7.1B, this method achieves over 98 percent language consistency across six languages while fine-tuning only 3-5 percent of parameters, without sacrificing task accuracy. Importantly, this result is nearly identical to that of full-scope fine-tuning (for example, above 98 percent language consistency for both methods across all prompt scenarios) but uses a fraction of the computational resources. To the best of our knowledge, this is the first approach to leverage layer-localization of language control for efficient multilingual adaptation. |
| title | LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them? |
| topic | Computation and Language Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2601.20009 |