Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lu, Yinhan, Jhajj, Gaganpreet, Zhang, Chen, Andy, Anietie, Adelani, David Ifeoluwa
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.02596
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913006689976320
author	Lu, Yinhan Jhajj, Gaganpreet Zhang, Chen Andy, Anietie Adelani, David Ifeoluwa
author_facet	Lu, Yinhan Jhajj, Gaganpreet Zhang, Chen Andy, Anietie Adelani, David Ifeoluwa
contents	In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks from a few examples, making it promising for languages underrepresented in pre-training. Recent work on many-shot ICL suggests that modern LLMs can further benefit from larger ICL examples enabled by their long context windows. However, such gains depend on careful example selection, and the inference cost can be prohibitive for low-resource language communities. In this paper, we present an empirical study of many-shot ICL for machine translation from English into ten truly low-resource languages recently added to FLORES+. We analyze the effects of retrieving more informative examples, using out-of-domain data, and ordering examples by length. Our findings show that many-shot ICL becomes more effective as the number of examples increases. More importantly, we show that BM25-based retrieval substantially improves data efficiency: 50 retrieved examples roughly match 250 many-shot examples, while 250 retrieved examples perform similarly to 1,000 many-shot examples.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_02596
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages Lu, Yinhan Jhajj, Gaganpreet Zhang, Chen Andy, Anietie Adelani, David Ifeoluwa Computation and Language In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks from a few examples, making it promising for languages underrepresented in pre-training. Recent work on many-shot ICL suggests that modern LLMs can further benefit from larger ICL examples enabled by their long context windows. However, such gains depend on careful example selection, and the inference cost can be prohibitive for low-resource language communities. In this paper, we present an empirical study of many-shot ICL for machine translation from English into ten truly low-resource languages recently added to FLORES+. We analyze the effects of retrieving more informative examples, using out-of-domain data, and ordering examples by length. Our findings show that many-shot ICL becomes more effective as the number of examples increases. More importantly, we show that BM25-based retrieval substantially improves data efficiency: 50 retrieved examples roughly match 250 many-shot examples, while 250 retrieved examples perform similarly to 1,000 many-shot examples.
title	An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages
topic	Computation and Language
url	https://arxiv.org/abs/2604.02596

Similar Items