Saved in:
| Main Authors: | Kostiuk, Yevhen, Vitman, Oxana, Gagała, Łukasz, Kiulian, Artur |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.09154 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Veln(ia)s is in the Details: Evaluating LLM Judgment on Latvian and Lithuanian Short Answer Matching
by: Kostiuk, Yevhen, et al.
Published: (2025)
by: Kostiuk, Yevhen, et al.
Published: (2025)
From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages
by: Kiulian, Artur, et al.
Published: (2024)
by: Kiulian, Artur, et al.
Published: (2024)
Dialectical Behavior Therapy Approach to LLM Prompting
by: Vitman, Oxana, et al.
Published: (2024)
by: Vitman, Oxana, et al.
Published: (2024)
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation
by: Kiulian, Artur, et al.
Published: (2024)
by: Kiulian, Artur, et al.
Published: (2024)
Towards Multilingual LLM Evaluation for European Languages
by: Thellmann, Klaudia, et al.
Published: (2024)
by: Thellmann, Klaudia, et al.
Published: (2024)
One prompt is not enough: Instruction Sensitivity Undermines Embedding Model Evaluation
by: Kostiuk, Yevhen, et al.
Published: (2026)
by: Kostiuk, Yevhen, et al.
Published: (2026)
Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation
by: Kreutzer, Julia, et al.
Published: (2025)
by: Kreutzer, Julia, et al.
Published: (2025)
Implementing a Nordic-Baltic Federated Health Data Network: a case report
by: Chomutare, Taridzo, et al.
Published: (2024)
by: Chomutare, Taridzo, et al.
Published: (2024)
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
by: Yoo, Haneul, et al.
Published: (2024)
by: Yoo, Haneul, et al.
Published: (2024)
Abstractive Summarization of Low resourced Nepali language using Multilingual Transformers
by: Dhakal, Prakash, et al.
Published: (2024)
by: Dhakal, Prakash, et al.
Published: (2024)
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback
by: Tonga, Junior Cedric, et al.
Published: (2025)
by: Tonga, Junior Cedric, et al.
Published: (2025)
Open Llama2 Model for the Lithuanian Language
by: Nakvosas, Artūras, et al.
Published: (2024)
by: Nakvosas, Artūras, et al.
Published: (2024)
LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs
by: Guo, Pei-Fu, et al.
Published: (2025)
by: Guo, Pei-Fu, et al.
Published: (2025)
Multilingual LLMs Are Not Multilingual Thinkers: Evidence from Hindi Analogy Evaluation
by: Gupta, Ashray, et al.
Published: (2025)
by: Gupta, Ashray, et al.
Published: (2025)
Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages
by: Khan, Shaharukh, et al.
Published: (2026)
by: Khan, Shaharukh, et al.
Published: (2026)
Backtranslation and paraphrasing in the LLM era? Comparing data augmentation methods for emotion classification
by: Radliński, Łukasz, et al.
Published: (2025)
by: Radliński, Łukasz, et al.
Published: (2025)
Measuring Moral LLM Responses in Multilingual Capacities
by: Basu, Kimaya, et al.
Published: (2025)
by: Basu, Kimaya, et al.
Published: (2025)
Multilingual Large Language Models do not comprehend all natural languages to equal degrees
by: Moskvina, Natalia, et al.
Published: (2026)
by: Moskvina, Natalia, et al.
Published: (2026)
M-Prometheus: A Suite of Open Multilingual LLM Judges
by: Pombal, José, et al.
Published: (2025)
by: Pombal, José, et al.
Published: (2025)
RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations
by: Gwozdz, Jonas, et al.
Published: (2025)
by: Gwozdz, Jonas, et al.
Published: (2025)
MELA: Multilingual Evaluation of Linguistic Acceptability
by: Zhang, Ziyin, et al.
Published: (2023)
by: Zhang, Ziyin, et al.
Published: (2023)
Is Multilingual LLM Watermarking Truly Multilingual? Scaling Robustness to 100+ Languages via Back-Translation
by: Mohamed, Asim, et al.
Published: (2025)
by: Mohamed, Asim, et al.
Published: (2025)
Towards Safe Multilingual Frontier AI
by: Kanepajs, Artūrs, et al.
Published: (2024)
by: Kanepajs, Artūrs, et al.
Published: (2024)
Multilingual jailbreaking of LLMs using low-resource languages
by: Marx, Dylan, et al.
Published: (2026)
by: Marx, Dylan, et al.
Published: (2026)
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
by: Sitaram, Sunayana, et al.
Published: (2025)
by: Sitaram, Sunayana, et al.
Published: (2025)
Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation
by: Alam, Firoj, et al.
Published: (2026)
by: Alam, Firoj, et al.
Published: (2026)
M-IFEval: Multilingual Instruction-Following Evaluation
by: Dussolle, Antoine, et al.
Published: (2025)
by: Dussolle, Antoine, et al.
Published: (2025)
The Roles of English in Evaluating Multilingual Language Models
by: Poelman, Wessel, et al.
Published: (2024)
by: Poelman, Wessel, et al.
Published: (2024)
Translation as a Scalable Proxy for Multilingual Evaluation
by: Issaka, Sheriff, et al.
Published: (2026)
by: Issaka, Sheriff, et al.
Published: (2026)
Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?
by: Chen, Pinzhen, et al.
Published: (2024)
by: Chen, Pinzhen, et al.
Published: (2024)
FIBER: A Multilingual Evaluation Resource for Factual Inference Bias
by: Munis, Evren Ayberk, et al.
Published: (2025)
by: Munis, Evren Ayberk, et al.
Published: (2025)
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
by: Rystrøm, Jonathan, et al.
Published: (2025)
by: Rystrøm, Jonathan, et al.
Published: (2025)
Towards Understanding the Robustness of LLM-based Evaluations under Perturbations
by: Chaudhary, Manav, et al.
Published: (2024)
by: Chaudhary, Manav, et al.
Published: (2024)
Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems
by: Wu, Wanxing, et al.
Published: (2026)
by: Wu, Wanxing, et al.
Published: (2026)
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
by: Borchmann, Łukasz
Published: (2024)
by: Borchmann, Łukasz
Published: (2024)
Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History
by: Zhong, Qishuai, et al.
Published: (2025)
by: Zhong, Qishuai, et al.
Published: (2025)
Toward Robust Multilingual Adaptation of LLMs for Low-Resource Languages
by: Li, Haolin, et al.
Published: (2025)
by: Li, Haolin, et al.
Published: (2025)
Multilingual Training and Evaluation Resources for Vision-Language Models
by: Baiamonte, Daniela, et al.
Published: (2026)
by: Baiamonte, Daniela, et al.
Published: (2026)
CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics
by: Agarwal, Parth, et al.
Published: (2025)
by: Agarwal, Parth, et al.
Published: (2025)
Similar Items
-
The Veln(ia)s is in the Details: Evaluating LLM Judgment on Latvian and Lithuanian Short Answer Matching
by: Kostiuk, Yevhen, et al.
Published: (2025) -
From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages
by: Kiulian, Artur, et al.
Published: (2024) -
Dialectical Behavior Therapy Approach to LLM Prompting
by: Vitman, Oxana, et al.
Published: (2024) -
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation
by: Kiulian, Artur, et al.
Published: (2024) -
Towards Multilingual LLM Evaluation for European Languages
by: Thellmann, Klaudia, et al.
Published: (2024)