Saved in:
| Main Authors: | Nabil, Hathout, Calderone, Basilio, Namer, Fiammetta, Sajous, Franck |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.12442 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Le sens de la famille : analyse du vocabulaire de la parent{é} par les plongements de mots
by: Tanguy, Ludovic, et al.
Published: (2024)
by: Tanguy, Ludovic, et al.
Published: (2024)
DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling
by: Fedorova, Mariia, et al.
Published: (2026)
by: Fedorova, Mariia, et al.
Published: (2026)
OpenStaxQA: A multilingual dataset based on open-source college textbooks
by: Gupta, Pranav
Published: (2025)
by: Gupta, Pranav
Published: (2025)
A large-scale image-text dataset benchmark for farmland segmentation
by: Tao, Chao, et al.
Published: (2025)
by: Tao, Chao, et al.
Published: (2025)
BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
by: Wang, Siyan, et al.
Published: (2024)
by: Wang, Siyan, et al.
Published: (2024)
Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain
by: Masson, Maxime, et al.
Published: (2023)
by: Masson, Maxime, et al.
Published: (2023)
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
by: Matos, João, et al.
Published: (2024)
by: Matos, João, et al.
Published: (2024)
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models
by: Wang, Liangdong, et al.
Published: (2024)
by: Wang, Liangdong, et al.
Published: (2024)
Morphosyntactic probing of multilingual BERT models
by: Acs, Judit, et al.
Published: (2023)
by: Acs, Judit, et al.
Published: (2023)
AlleNoise: large-scale text classification benchmark dataset with real-world label noise
by: Rączkowska, Alicja, et al.
Published: (2024)
by: Rączkowska, Alicja, et al.
Published: (2024)
A multilingual hallucination benchmark: MultiWikiQHalluA
by: Thoresen, Freja, et al.
Published: (2026)
by: Thoresen, Freja, et al.
Published: (2026)
Continuous sentiment scores for literary and multilingual contexts
by: Lyngbaek, Laurits, et al.
Published: (2025)
by: Lyngbaek, Laurits, et al.
Published: (2025)
Retrieval-augmented generation in multilingual settings
by: Chirkova, Nadezhda, et al.
Published: (2024)
by: Chirkova, Nadezhda, et al.
Published: (2024)
A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages
by: Aliyu, Saminu Mohammad, et al.
Published: (2024)
by: Aliyu, Saminu Mohammad, et al.
Published: (2024)
600k-ks-ocr: a large-scale synthetic dataset for optical character recognition in kashmiri script
by: Malik, Haq Nawaz
Published: (2026)
by: Malik, Haq Nawaz
Published: (2026)
Understanding the role of FFNs in driving multilingual behaviour in LLMs
by: Bhattacharya, Sunit, et al.
Published: (2024)
by: Bhattacharya, Sunit, et al.
Published: (2024)
Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval
by: Jouanneau, Warren, et al.
Published: (2024)
by: Jouanneau, Warren, et al.
Published: (2024)
A multimodal multiplex of the mental lexicon for multilingual individuals
by: Huynh, Maria, et al.
Published: (2025)
by: Huynh, Maria, et al.
Published: (2025)
Whale: Large-Scale multilingual ASR model with w2v-BERT and E-Branchformer with large speech data
by: Kashiwagi, Yosuke, et al.
Published: (2025)
by: Kashiwagi, Yosuke, et al.
Published: (2025)
On the limited utility of parallel data for learning shared multilingual representations
by: Leino, Julius, et al.
Published: (2026)
by: Leino, Julius, et al.
Published: (2026)
MultiCaption: Detecting disinformation using multilingual visual claims
by: Frade, Rafael Martins, et al.
Published: (2026)
by: Frade, Rafael Martins, et al.
Published: (2026)
EuroGEST: Investigating gender stereotypes in multilingual language models
by: Rowe, Jacqueline, et al.
Published: (2025)
by: Rowe, Jacqueline, et al.
Published: (2025)
Towards a resource for multilingual lexicons: an MT assisted and human-in-the-loop multilingual parallel corpus with multi-word expression annotation
by: Han, Lifeng, et al.
Published: (2020)
by: Han, Lifeng, et al.
Published: (2020)
A benchmark dataset for evaluating Syndrome Differentiation and Treatment in large language models
by: Li, Kunning, et al.
Published: (2025)
by: Li, Kunning, et al.
Published: (2025)
SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
by: Kamzela, Wiktor, et al.
Published: (2025)
by: Kamzela, Wiktor, et al.
Published: (2025)
Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation
by: Tan, Shaomu, et al.
Published: (2024)
by: Tan, Shaomu, et al.
Published: (2024)
Understanding the effects of language-specific class imbalance in multilingual fine-tuning
by: Jung, Vincent, et al.
Published: (2024)
by: Jung, Vincent, et al.
Published: (2024)
Automatic register identification for the open web using multilingual deep learning
by: Henriksson, Erik, et al.
Published: (2024)
by: Henriksson, Erik, et al.
Published: (2024)
Scalable multilingual PII annotation for responsible AI in LLMs
by: Meena, Bharti, et al.
Published: (2025)
by: Meena, Bharti, et al.
Published: (2025)
Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs
by: Abdaljalil, Samir, et al.
Published: (2026)
by: Abdaljalil, Samir, et al.
Published: (2026)
Effective vocabulary expanding of multilingual language models for extremely low-resource languages
by: Zheng, Jianyu
Published: (2026)
by: Zheng, Jianyu
Published: (2026)
SENSE models: an open source solution for multilingual and multimodal semantic-based tasks
by: Mdhaffar, Salima, et al.
Published: (2025)
by: Mdhaffar, Salima, et al.
Published: (2025)
Information availability in different languages and various technological constraints related to multilinguism on the Internet
by: Khosla, Sonal, et al.
Published: (2025)
by: Khosla, Sonal, et al.
Published: (2025)
FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings
by: Kesiraju, Santosh, et al.
Published: (2026)
by: Kesiraju, Santosh, et al.
Published: (2026)
MM-THEBench: Do Reasoning MLLMs Think Reasonably?
by: Huang, Zhidian, et al.
Published: (2026)
by: Huang, Zhidian, et al.
Published: (2026)
Disentangling concept semantics via multilingual averaging in Sparse Autoencoders
by: O'Reilly, Cliff, et al.
Published: (2025)
by: O'Reilly, Cliff, et al.
Published: (2025)
ZIPA: A family of efficient models for multilingual phone recognition
by: Zhu, Jian, et al.
Published: (2025)
by: Zhu, Jian, et al.
Published: (2025)
Artificial intelligence language technologies in multilingual healthcare: Grand challenges ahead
by: Briva-Iglesias, Vicent
Published: (2026)
by: Briva-Iglesias, Vicent
Published: (2026)
Mapping the Web of Science, a large-scale graph and text-based dataset with LLM embeddings
by: Kunt, Tim, et al.
Published: (2026)
by: Kunt, Tim, et al.
Published: (2026)
One ruler to measure them all: Benchmarking multilingual long-context language models
by: Kim, Yekyung, et al.
Published: (2025)
by: Kim, Yekyung, et al.
Published: (2025)
Similar Items
-
Le sens de la famille : analyse du vocabulaire de la parent{é} par les plongements de mots
by: Tanguy, Ludovic, et al.
Published: (2024) -
DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling
by: Fedorova, Mariia, et al.
Published: (2026) -
OpenStaxQA: A multilingual dataset based on open-source college textbooks
by: Gupta, Pranav
Published: (2025) -
A large-scale image-text dataset benchmark for farmland segmentation
by: Tao, Chao, et al.
Published: (2025) -
BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
by: Wang, Siyan, et al.
Published: (2024)