Saved in:
Bibliographic Details
Main Author: Karpov, Dmitry
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04442
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915773786619904
author Karpov, Dmitry
author_facet Karpov, Dmitry
contents We explore machine translation for five Turkic language pairs: Russian-Bashkir, Russian-Kazakh, Russian-Kyrgyz, English-Tatar, English-Chuvash. Fine-tuning nllb-200-distilled-600M with LoRA on synthetic data achieved chrF++ 49.71 for Kazakh and 46.94 for Bashkir. Prompting DeepSeek-V3.2 with retrieved similar examples achieved chrF++ 39.47 for Chuvash. For Tatar, zero-shot or retrieval-based approaches achieved chrF++ 41.6, while for Kyrgyz the zero-shot approach reached 45.6. We release the dataset and the obtained weights.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04442
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data
Karpov, Dmitry
Computation and Language
Artificial Intelligence
Machine Learning
We explore machine translation for five Turkic language pairs: Russian-Bashkir, Russian-Kazakh, Russian-Kyrgyz, English-Tatar, English-Chuvash. Fine-tuning nllb-200-distilled-600M with LoRA on synthetic data achieved chrF++ 49.71 for Kazakh and 46.94 for Bashkir. Prompting DeepSeek-V3.2 with retrieved similar examples achieved chrF++ 39.47 for Chuvash. For Tatar, zero-shot or retrieval-based approaches achieved chrF++ 41.6, while for Kyrgyz the zero-shot approach reached 45.6. We release the dataset and the obtained weights.
title No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2602.04442