:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nabil, Hathout, Calderone, Basilio, Namer, Fiammetta, Sajous, Franck
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.12442
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Le sens de la famille : analyse du vocabulaire de la parent{é} par les plongements de mots
by: Tanguy, Ludovic, et al.
Published: (2024)

DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling
by: Fedorova, Mariia, et al.
Published: (2026)

OpenStaxQA: A multilingual dataset based on open-source college textbooks
by: Gupta, Pranav
Published: (2025)

A large-scale image-text dataset benchmark for farmland segmentation
by: Tao, Chao, et al.
Published: (2025)

BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
by: Wang, Siyan, et al.
Published: (2024)

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain
by: Masson, Maxime, et al.
Published: (2023)

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
by: Matos, João, et al.
Published: (2024)

CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models
by: Wang, Liangdong, et al.
Published: (2024)

Morphosyntactic probing of multilingual BERT models
by: Acs, Judit, et al.
Published: (2023)

AlleNoise: large-scale text classification benchmark dataset with real-world label noise
by: Rączkowska, Alicja, et al.
Published: (2024)

A multilingual hallucination benchmark: MultiWikiQHalluA
by: Thoresen, Freja, et al.
Published: (2026)

Continuous sentiment scores for literary and multilingual contexts
by: Lyngbaek, Laurits, et al.
Published: (2025)

Retrieval-augmented generation in multilingual settings
by: Chirkova, Nadezhda, et al.
Published: (2024)

A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages
by: Aliyu, Saminu Mohammad, et al.
Published: (2024)

600k-ks-ocr: a large-scale synthetic dataset for optical character recognition in kashmiri script
by: Malik, Haq Nawaz
Published: (2026)

Understanding the role of FFNs in driving multilingual behaviour in LLMs
by: Bhattacharya, Sunit, et al.
Published: (2024)

Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval
by: Jouanneau, Warren, et al.
Published: (2024)

A multimodal multiplex of the mental lexicon for multilingual individuals
by: Huynh, Maria, et al.
Published: (2025)

Whale: Large-Scale multilingual ASR model with w2v-BERT and E-Branchformer with large speech data
by: Kashiwagi, Yosuke, et al.
Published: (2025)

On the limited utility of parallel data for learning shared multilingual representations
by: Leino, Julius, et al.
Published: (2026)

MultiCaption: Detecting disinformation using multilingual visual claims
by: Frade, Rafael Martins, et al.
Published: (2026)

EuroGEST: Investigating gender stereotypes in multilingual language models
by: Rowe, Jacqueline, et al.
Published: (2025)

Towards a resource for multilingual lexicons: an MT assisted and human-in-the-loop multilingual parallel corpus with multi-word expression annotation
by: Han, Lifeng, et al.
Published: (2020)

A benchmark dataset for evaluating Syndrome Differentiation and Treatment in large language models
by: Li, Kunning, et al.
Published: (2025)

SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
by: Kamzela, Wiktor, et al.
Published: (2025)

Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation
by: Tan, Shaomu, et al.
Published: (2024)

Understanding the effects of language-specific class imbalance in multilingual fine-tuning
by: Jung, Vincent, et al.
Published: (2024)

Automatic register identification for the open web using multilingual deep learning
by: Henriksson, Erik, et al.
Published: (2024)

Scalable multilingual PII annotation for responsible AI in LLMs
by: Meena, Bharti, et al.
Published: (2025)

Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs
by: Abdaljalil, Samir, et al.
Published: (2026)

Effective vocabulary expanding of multilingual language models for extremely low-resource languages
by: Zheng, Jianyu
Published: (2026)

SENSE models: an open source solution for multilingual and multimodal semantic-based tasks
by: Mdhaffar, Salima, et al.
Published: (2025)

Information availability in different languages and various technological constraints related to multilinguism on the Internet
by: Khosla, Sonal, et al.
Published: (2025)

FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings
by: Kesiraju, Santosh, et al.
Published: (2026)

MM-THEBench: Do Reasoning MLLMs Think Reasonably?
by: Huang, Zhidian, et al.
Published: (2026)

Disentangling concept semantics via multilingual averaging in Sparse Autoencoders
by: O'Reilly, Cliff, et al.
Published: (2025)

ZIPA: A family of efficient models for multilingual phone recognition
by: Zhu, Jian, et al.
Published: (2025)

Artificial intelligence language technologies in multilingual healthcare: Grand challenges ahead
by: Briva-Iglesias, Vicent
Published: (2026)

Mapping the Web of Science, a large-scale graph and text-based dataset with LLM embeddings
by: Kunt, Tim, et al.
Published: (2026)

One ruler to measure them all: Benchmarking multilingual long-context language models
by: Kim, Yekyung, et al.
Published: (2025)