Saved in:
| Main Author: | Senaratna, Nuwan I. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.04124 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling Laws for Multilingual Language Models
by: He, Yifei, et al.
Published: (2024)
by: He, Yifei, et al.
Published: (2024)
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
by: Samir, Farhan, et al.
Published: (2024)
by: Samir, Farhan, et al.
Published: (2024)
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
by: Longpre, Shayne, et al.
Published: (2025)
by: Longpre, Shayne, et al.
Published: (2025)
Provision of Periodicals in the Libraries of Sri Lanka
by: Bandara, S. B.
Published: (1975)
by: Bandara, S. B.
Published: (1975)
Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST)
by: Liu, Jiarui, et al.
Published: (2024)
by: Liu, Jiarui, et al.
Published: (2024)
Factors Influencing the Sustainability of Resource Use and Management Within Multiple Use Marine Protected Areas
by: Senaratna, S.
Published: (1999)
by: Senaratna, S.
Published: (1999)
A Multilingual Similarity Dataset for News Article Frame
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents
by: Zhu, Mengna, et al.
Published: (2024)
by: Zhu, Mengna, et al.
Published: (2024)
OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics
by: Chu, Wei, et al.
Published: (2025)
by: Chu, Wei, et al.
Published: (2025)
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
by: He, Haorui, et al.
Published: (2025)
by: He, Haorui, et al.
Published: (2025)
DocHPLT: A Massively Multilingual Document-Level Translation Dataset
by: O'Brien, Dayyán, et al.
Published: (2025)
by: O'Brien, Dayyán, et al.
Published: (2025)
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset
by: Xie, Peng, et al.
Published: (2025)
by: Xie, Peng, et al.
Published: (2025)
CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
by: Zhu, Mengna, et al.
Published: (2024)
by: Zhu, Mengna, et al.
Published: (2024)
BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR
by: Hasan, Md. Najib, et al.
Published: (2026)
by: Hasan, Md. Najib, et al.
Published: (2026)
A New Massive Multilingual Dataset for High-Performance Language Technologies
by: de Gibert, Ona, et al.
Published: (2024)
by: de Gibert, Ona, et al.
Published: (2024)
A diverse Multilingual News Headlines Dataset from around the World
by: Leeb, Felix, et al.
Published: (2024)
by: Leeb, Felix, et al.
Published: (2024)
Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task
by: Caillaut, Gaëtan, et al.
Published: (2024)
by: Caillaut, Gaëtan, et al.
Published: (2024)
A Study on Scaling Up Multilingual News Framing Analysis
by: Akter, Syeda Sabrina, et al.
Published: (2024)
by: Akter, Syeda Sabrina, et al.
Published: (2024)
HPLT 3.0: Very Large-Scale Multilingual Resources for LLMs and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models
by: Oepen, Stephan, et al.
Published: (2025)
by: Oepen, Stephan, et al.
Published: (2025)
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
by: Qin, Libo, et al.
Published: (2024)
by: Qin, Libo, et al.
Published: (2024)
ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics
by: Al-Khalifa, Hend, et al.
Published: (2026)
by: Al-Khalifa, Hend, et al.
Published: (2026)
Large Language Models: A New Approach for Privacy Policy Analysis at Scale
by: Rodriguez, David, et al.
Published: (2024)
by: Rodriguez, David, et al.
Published: (2024)
CUTE: A Multilingual Dataset for Enhancing Cross-Lingual Knowledge Transfer in Low-Resource Languages
by: Zhuang, Wenhao, et al.
Published: (2025)
by: Zhuang, Wenhao, et al.
Published: (2025)
TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law
by: Xuan, Xi, et al.
Published: (2025)
by: Xuan, Xi, et al.
Published: (2025)
FineFreq: A Multilingual Character Frequency Dataset from Web-Scale Text
by: Xu, Binbin
Published: (2025)
by: Xu, Binbin
Published: (2025)
The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages
by: Mutisya, Hillary, et al.
Published: (2026)
by: Mutisya, Hillary, et al.
Published: (2026)
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
by: Pandey, Prabhat, et al.
Published: (2025)
by: Pandey, Prabhat, et al.
Published: (2025)
Temporal Scaling Law for Large Language Models
by: Xiong, Yizhe, et al.
Published: (2024)
by: Xiong, Yizhe, et al.
Published: (2024)
Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages
by: Khan, Shaharukh, et al.
Published: (2026)
by: Khan, Shaharukh, et al.
Published: (2026)
Scaling Laws for Conditional Emergence of Multilingual Image Captioning via Generalization from Translation
by: Spravil, Julian, et al.
Published: (2025)
by: Spravil, Julian, et al.
Published: (2025)
TFD: A Comprehensive Structured Tibetan Foundation Dataset for Low-Resource Language Processing and Large-Scale Modeling
by: Huang, Cheng, et al.
Published: (2025)
by: Huang, Cheng, et al.
Published: (2025)
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
The 2021 Tokyo Olympics Multilingual News Article Dataset
by: Novak, Erik, et al.
Published: (2025)
by: Novak, Erik, et al.
Published: (2025)
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World
by: Semnani, Sina J., et al.
Published: (2025)
by: Semnani, Sina J., et al.
Published: (2025)
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
by: Sinha, Samridhi Raj, et al.
Published: (2025)
by: Sinha, Samridhi Raj, et al.
Published: (2025)
Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource Languages
by: Puranegedara, Imalsha, et al.
Published: (2025)
by: Puranegedara, Imalsha, et al.
Published: (2025)
M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction
by: Liu, Jiang, et al.
Published: (2024)
by: Liu, Jiang, et al.
Published: (2024)
Scaling Laws for Fact Memorization of Large Language Models
by: Lu, Xingyu, et al.
Published: (2024)
by: Lu, Xingyu, et al.
Published: (2024)
Multilingual Contextualization of Large Language Models for Document-Level Machine Translation
by: Ramos, Miguel Moura, et al.
Published: (2025)
by: Ramos, Miguel Moura, et al.
Published: (2025)
Similar Items
-
Scaling Laws for Multilingual Language Models
by: He, Yifei, et al.
Published: (2024) -
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
by: Samir, Farhan, et al.
Published: (2024) -
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
by: Longpre, Shayne, et al.
Published: (2025) -
Provision of Periodicals in the Libraries of Sri Lanka
by: Bandara, S. B.
Published: (1975) -
Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST)
by: Liu, Jiarui, et al.
Published: (2024)