Saved in:
| Main Authors: | Edman, Lukas, Schmid, Helmut, Fraser, Alexander |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.17784 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CUTE: Measuring LLMs' Understanding of Their Tokens
by: Edman, Lukas, et al.
Published: (2024)
by: Edman, Lukas, et al.
Published: (2024)
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs
by: Edman, Lukas, et al.
Published: (2025)
by: Edman, Lukas, et al.
Published: (2025)
Are BabyLMs Second Language Learners?
by: Edman, Lukas, et al.
Published: (2024)
by: Edman, Lukas, et al.
Published: (2024)
Beyond Literal Token Overlap: Token Alignability for Multilinguality
by: Hämmerl, Katharina, et al.
Published: (2025)
by: Hämmerl, Katharina, et al.
Published: (2025)
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
by: He, Linyang, et al.
Published: (2025)
by: He, Linyang, et al.
Published: (2025)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
by: Nie, Ercong, et al.
Published: (2025)
by: Nie, Ercong, et al.
Published: (2025)
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback
by: Lai, Wen, et al.
Published: (2024)
by: Lai, Wen, et al.
Published: (2024)
Extending Multilingual Machine Translation through Imitation Learning
by: Lai, Wen, et al.
Published: (2023)
by: Lai, Wen, et al.
Published: (2023)
Understanding Cross-Lingual Alignment -- A Survey
by: Hämmerl, Katharina, et al.
Published: (2024)
by: Hämmerl, Katharina, et al.
Published: (2024)
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation
by: Edman, Lukas, et al.
Published: (2023)
by: Edman, Lukas, et al.
Published: (2023)
EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian
by: Dementieva, Daryna, et al.
Published: (2025)
by: Dementieva, Daryna, et al.
Published: (2025)
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024)
by: Yang, Yahan, et al.
Published: (2024)
LLM in the Loop: Creating the ParaDeHate Dataset for Hate Speech Detoxification
by: Yuan, Shuzhou, et al.
Published: (2025)
by: Yuan, Shuzhou, et al.
Published: (2025)
Decomposed Prompting: Probing Multilingual Linguistic Structure Knowledge in Large Language Models
by: Nie, Ercong, et al.
Published: (2024)
by: Nie, Ercong, et al.
Published: (2024)
DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking
by: Bu, Lanni, et al.
Published: (2025)
by: Bu, Lanni, et al.
Published: (2025)
CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English
by: Dementieva, Daryna, et al.
Published: (2025)
by: Dementieva, Daryna, et al.
Published: (2025)
PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions
by: Niazi, Ruhallah, et al.
Published: (2026)
by: Niazi, Ruhallah, et al.
Published: (2026)
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
by: Dang, Thao Anh, et al.
Published: (2024)
by: Dang, Thao Anh, et al.
Published: (2024)
MUTANT: A Recipe for Multilingual Tokenizer Design
by: Rana, Souvik, et al.
Published: (2025)
by: Rana, Souvik, et al.
Published: (2025)
The Token Tax: Systematic Bias in Multilingual Tokenization
by: Lundin, Jessica M., et al.
Published: (2025)
by: Lundin, Jessica M., et al.
Published: (2025)
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
by: Ma, Bolei, et al.
Published: (2024)
by: Ma, Bolei, et al.
Published: (2024)
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
by: Schmidt, Fabian David, et al.
Published: (2025)
by: Schmidt, Fabian David, et al.
Published: (2025)
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
by: Shen, Yingli, et al.
Published: (2025)
by: Shen, Yingli, et al.
Published: (2025)
Krutrim LLM: A Novel Tokenization Strategy for Multilingual Indic Languages with Petabyte-Scale Data Processing
by: Kumar, Rahul, et al.
Published: (2024)
by: Kumar, Rahul, et al.
Published: (2024)
DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture
by: Maji, Arijit, et al.
Published: (2025)
by: Maji, Arijit, et al.
Published: (2025)
SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia
by: Yue, Pengfei, et al.
Published: (2026)
by: Yue, Pengfei, et al.
Published: (2026)
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding
by: Shen, Yingli, et al.
Published: (2026)
by: Shen, Yingli, et al.
Published: (2026)
Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking
by: Hu, Songbo, et al.
Published: (2026)
by: Hu, Songbo, et al.
Published: (2026)
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
by: Son, Guijin, et al.
Published: (2024)
by: Son, Guijin, et al.
Published: (2024)
JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and JDs
by: Carrino, Casimiro Pio, et al.
Published: (2026)
by: Carrino, Casimiro Pio, et al.
Published: (2026)
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
by: Yoo, Haneul, et al.
Published: (2024)
by: Yoo, Haneul, et al.
Published: (2024)
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
by: Abagyan, Diana, et al.
Published: (2025)
by: Abagyan, Diana, et al.
Published: (2025)
Contamination Report for Multilingual Benchmarks
by: Ahuja, Sanchit, et al.
Published: (2024)
by: Ahuja, Sanchit, et al.
Published: (2024)
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
by: Shen, Yingli, et al.
Published: (2025)
by: Shen, Yingli, et al.
Published: (2025)
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You
by: Friedrich, Felix, et al.
Published: (2024)
by: Friedrich, Felix, et al.
Published: (2024)
MMATH: A Multilingual Benchmark for Mathematical Reasoning
by: Luo, Wenyang, et al.
Published: (2025)
by: Luo, Wenyang, et al.
Published: (2025)
MUCH: A Multilingual Claim Hallucination Benchmark
by: Dentan, Jérémie, et al.
Published: (2025)
by: Dentan, Jérémie, et al.
Published: (2025)
SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding
by: Hou, Shuyang, et al.
Published: (2026)
by: Hou, Shuyang, et al.
Published: (2026)
On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
Similar Items
-
CUTE: Measuring LLMs' Understanding of Their Tokens
by: Edman, Lukas, et al.
Published: (2024) -
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs
by: Edman, Lukas, et al.
Published: (2025) -
Are BabyLMs Second Language Learners?
by: Edman, Lukas, et al.
Published: (2024) -
Beyond Literal Token Overlap: Token Alignability for Multilinguality
by: Hämmerl, Katharina, et al.
Published: (2025) -
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
by: He, Linyang, et al.
Published: (2025)