:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Zheng, Jianyu
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.09388
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InkubaLM: A small language model for low-resource African languages
by: Tonja, Atnafu Lambebo, et al.
Published: (2024)

EuroGEST: Investigating gender stereotypes in multilingual language models
by: Rowe, Jacqueline, et al.
Published: (2025)

Morphological evaluation of subwords vocabulary used by BETO language model
by: García-Sierra, Óscar, et al.
Published: (2024)

The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
by: Gouvert, Olivier, et al.
Published: (2025)

One ruler to measure them all: Benchmarking multilingual long-context language models
by: Kim, Yekyung, et al.
Published: (2025)

Cross-lingual transfer of multilingual models on low resource African Languages
by: Thangaraj, Harish, et al.
Published: (2024)

Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages
by: Filip, Tomáš, et al.
Published: (2024)

SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
by: Kamzela, Wiktor, et al.
Published: (2025)

Understanding the effects of language-specific class imbalance in multilingual fine-tuning
by: Jung, Vincent, et al.
Published: (2024)

Phonetically rich corpus construction for a low-resourced language
by: Amadeus, Marcellus, et al.
Published: (2024)

Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages?
by: Dipto, Tawsif Tashwar, et al.
Published: (2025)

Multilingual jailbreaking of LLMs using low-resource languages
by: Marx, Dylan, et al.
Published: (2026)

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
by: Bhattacharya, Antara Raaghavi, et al.
Published: (2025)

Artificial intelligence language technologies in multilingual healthcare: Grand challenges ahead
by: Briva-Iglesias, Vicent
Published: (2026)

Information availability in different languages and various technological constraints related to multilinguism on the Internet
by: Khosla, Sonal, et al.
Published: (2025)

Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation
by: Chirkova, Nadezhda, et al.
Published: (2023)

Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources
by: Sukeda, Issey
Published: (2024)

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
by: Matos, João, et al.
Published: (2024)

Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages
by: Lankford, Séamus, et al.
Published: (2024)

A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages
by: Aliyu, Saminu Mohammad, et al.
Published: (2024)

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)

How do datasets, developers, and models affect biases in a low-resourced language?: The Case of the Bengali Language
by: Das, Dipto, et al.
Published: (2025)

A comparison of pipelines for the translation of a low resource language based on transformers
by: Bonfanti, Chiara, et al.
Published: (2025)

MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
by: Hupkes, Dieuwke, et al.
Published: (2025)

Building low-resource African language corpora: A case study of Kidawida, Kalenjin and Dholuo
by: Mbogho, Audrey, et al.
Published: (2025)

A multilingual training strategy for low resource Text to Speech
by: Amalas, Asma, et al.
Published: (2024)

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
by: Zhu, Jian, et al.
Published: (2023)

Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
by: Jimoh, Toheeb Aduramomi, et al.
Published: (2026)

Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages
by: Elsner, Micha, et al.
Published: (2025)

Large language models have learned to use language
by: Lupyan, Gary
Published: (2025)

Does language matter for spoken word classification? A multilingual generative meta-learning approach
by: Ziki, Batsirayi Mupamhi, et al.
Published: (2026)

Large language models are not about natural language
by: Bolhuis, Johan J., et al.
Published: (2025)

Dissociating language and thought in large language models
by: Mahowald, Kyle, et al.
Published: (2023)

Retrieval augmentation of large language models for lay language generation
by: Guo, Yue, et al.
Published: (2022)

Do large language models resemble humans in language use?
by: Cai, Zhenguang G., et al.
Published: (2023)

Studies with impossible languages falsify LMs as models of human language
by: Bowers, Jeffrey S., et al.
Published: (2025)

Can we teach language models to gloss endangered languages?
by: Ginn, Michael, et al.
Published: (2024)

Tiny language models
by: Gross, Ronit D., et al.
Published: (2025)

synthocr-gen: A synthetic ocr dataset generator for low-resource languages- breaking the data barrier
by: Malik, Haq Nawaz, et al.
Published: (2026)

Ukrainian-to-English folktale corpus: Parallel corpus creation and augmentation for machine translation in low-resource languages
by: Burda-Lassen, Olena
Published: (2024)