Saved in:
| Main Author: | Zheng, Jianyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09388 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InkubaLM: A small language model for low-resource African languages
by: Tonja, Atnafu Lambebo, et al.
Published: (2024)
by: Tonja, Atnafu Lambebo, et al.
Published: (2024)
EuroGEST: Investigating gender stereotypes in multilingual language models
by: Rowe, Jacqueline, et al.
Published: (2025)
by: Rowe, Jacqueline, et al.
Published: (2025)
Morphological evaluation of subwords vocabulary used by BETO language model
by: García-Sierra, Óscar, et al.
Published: (2024)
by: García-Sierra, Óscar, et al.
Published: (2024)
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
by: Gouvert, Olivier, et al.
Published: (2025)
by: Gouvert, Olivier, et al.
Published: (2025)
One ruler to measure them all: Benchmarking multilingual long-context language models
by: Kim, Yekyung, et al.
Published: (2025)
by: Kim, Yekyung, et al.
Published: (2025)
Cross-lingual transfer of multilingual models on low resource African Languages
by: Thangaraj, Harish, et al.
Published: (2024)
by: Thangaraj, Harish, et al.
Published: (2024)
Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages
by: Filip, Tomáš, et al.
Published: (2024)
by: Filip, Tomáš, et al.
Published: (2024)
SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
by: Kamzela, Wiktor, et al.
Published: (2025)
by: Kamzela, Wiktor, et al.
Published: (2025)
Understanding the effects of language-specific class imbalance in multilingual fine-tuning
by: Jung, Vincent, et al.
Published: (2024)
by: Jung, Vincent, et al.
Published: (2024)
Phonetically rich corpus construction for a low-resourced language
by: Amadeus, Marcellus, et al.
Published: (2024)
by: Amadeus, Marcellus, et al.
Published: (2024)
Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages?
by: Dipto, Tawsif Tashwar, et al.
Published: (2025)
by: Dipto, Tawsif Tashwar, et al.
Published: (2025)
Multilingual jailbreaking of LLMs using low-resource languages
by: Marx, Dylan, et al.
Published: (2026)
by: Marx, Dylan, et al.
Published: (2026)
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
by: Bhattacharya, Antara Raaghavi, et al.
Published: (2025)
by: Bhattacharya, Antara Raaghavi, et al.
Published: (2025)
Artificial intelligence language technologies in multilingual healthcare: Grand challenges ahead
by: Briva-Iglesias, Vicent
Published: (2026)
by: Briva-Iglesias, Vicent
Published: (2026)
Information availability in different languages and various technological constraints related to multilinguism on the Internet
by: Khosla, Sonal, et al.
Published: (2025)
by: Khosla, Sonal, et al.
Published: (2025)
Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation
by: Chirkova, Nadezhda, et al.
Published: (2023)
by: Chirkova, Nadezhda, et al.
Published: (2023)
Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources
by: Sukeda, Issey
Published: (2024)
by: Sukeda, Issey
Published: (2024)
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
by: Matos, João, et al.
Published: (2024)
by: Matos, João, et al.
Published: (2024)
Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages
by: Lankford, Séamus, et al.
Published: (2024)
by: Lankford, Séamus, et al.
Published: (2024)
A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages
by: Aliyu, Saminu Mohammad, et al.
Published: (2024)
by: Aliyu, Saminu Mohammad, et al.
Published: (2024)
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
How do datasets, developers, and models affect biases in a low-resourced language?: The Case of the Bengali Language
by: Das, Dipto, et al.
Published: (2025)
by: Das, Dipto, et al.
Published: (2025)
A comparison of pipelines for the translation of a low resource language based on transformers
by: Bonfanti, Chiara, et al.
Published: (2025)
by: Bonfanti, Chiara, et al.
Published: (2025)
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
by: Hupkes, Dieuwke, et al.
Published: (2025)
by: Hupkes, Dieuwke, et al.
Published: (2025)
Building low-resource African language corpora: A case study of Kidawida, Kalenjin and Dholuo
by: Mbogho, Audrey, et al.
Published: (2025)
by: Mbogho, Audrey, et al.
Published: (2025)
A multilingual training strategy for low resource Text to Speech
by: Amalas, Asma, et al.
Published: (2024)
by: Amalas, Asma, et al.
Published: (2024)
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
by: Zhu, Jian, et al.
Published: (2023)
by: Zhu, Jian, et al.
Published: (2023)
Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
by: Jimoh, Toheeb Aduramomi, et al.
Published: (2026)
by: Jimoh, Toheeb Aduramomi, et al.
Published: (2026)
Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages
by: Elsner, Micha, et al.
Published: (2025)
by: Elsner, Micha, et al.
Published: (2025)
Large language models have learned to use language
by: Lupyan, Gary
Published: (2025)
by: Lupyan, Gary
Published: (2025)
Does language matter for spoken word classification? A multilingual generative meta-learning approach
by: Ziki, Batsirayi Mupamhi, et al.
Published: (2026)
by: Ziki, Batsirayi Mupamhi, et al.
Published: (2026)
Large language models are not about natural language
by: Bolhuis, Johan J., et al.
Published: (2025)
by: Bolhuis, Johan J., et al.
Published: (2025)
Dissociating language and thought in large language models
by: Mahowald, Kyle, et al.
Published: (2023)
by: Mahowald, Kyle, et al.
Published: (2023)
Retrieval augmentation of large language models for lay language generation
by: Guo, Yue, et al.
Published: (2022)
by: Guo, Yue, et al.
Published: (2022)
Do large language models resemble humans in language use?
by: Cai, Zhenguang G., et al.
Published: (2023)
by: Cai, Zhenguang G., et al.
Published: (2023)
Studies with impossible languages falsify LMs as models of human language
by: Bowers, Jeffrey S., et al.
Published: (2025)
by: Bowers, Jeffrey S., et al.
Published: (2025)
Can we teach language models to gloss endangered languages?
by: Ginn, Michael, et al.
Published: (2024)
by: Ginn, Michael, et al.
Published: (2024)
Tiny language models
by: Gross, Ronit D., et al.
Published: (2025)
by: Gross, Ronit D., et al.
Published: (2025)
synthocr-gen: A synthetic ocr dataset generator for low-resource languages- breaking the data barrier
by: Malik, Haq Nawaz, et al.
Published: (2026)
by: Malik, Haq Nawaz, et al.
Published: (2026)
Ukrainian-to-English folktale corpus: Parallel corpus creation and augmentation for machine translation in low-resource languages
by: Burda-Lassen, Olena
Published: (2024)
by: Burda-Lassen, Olena
Published: (2024)
Similar Items
-
InkubaLM: A small language model for low-resource African languages
by: Tonja, Atnafu Lambebo, et al.
Published: (2024) -
EuroGEST: Investigating gender stereotypes in multilingual language models
by: Rowe, Jacqueline, et al.
Published: (2025) -
Morphological evaluation of subwords vocabulary used by BETO language model
by: García-Sierra, Óscar, et al.
Published: (2024) -
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
by: Gouvert, Olivier, et al.
Published: (2025) -
One ruler to measure them all: Benchmarking multilingual long-context language models
by: Kim, Yekyung, et al.
Published: (2025)