Saved in:
| Main Authors: | Plaza, Irene, Melero, Nina, del Pozo, Cristina, Conde, Javier, Reviriego, Pedro, Mayor-Rocher, Marina, Grandury, María |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.17789 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
It's the same but not the same: Do LLMs distinguish Spanish varieties?
by: Mayor-Rocher, Marina, et al.
Published: (2025)
by: Mayor-Rocher, Marina, et al.
Published: (2025)
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024)
by: Mayor-Rocher, Marina, et al.
Published: (2024)
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)
by: Awad, Samer, et al.
Published: (2026)
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
by: Arriaga, Carlos, et al.
Published: (2025)
by: Arriaga, Carlos, et al.
Published: (2025)
How does fine-tuning improve sensorimotor representations in large language models?
by: Wu, Minghua, et al.
Published: (2026)
by: Wu, Minghua, et al.
Published: (2026)
Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)
by: Ferrando, Raquel, et al.
Published: (2025)
Can ChatGPT Learn to Count Letters?
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)
by: Reviriego, Pedro, et al.
Published: (2023)
Speed and Conversational Large Language Models: Not All Is About Tokens per Second
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)
by: Zhu, Jinhua, et al.
Published: (2024)
Are We Done with MMLU?
by: Gema, Aryo Pradipta, et al.
Published: (2024)
by: Gema, Aryo Pradipta, et al.
Published: (2024)
Open Conversational LLMs do not know most Spanish words
by: Conde, Javier, et al.
Published: (2024)
by: Conde, Javier, et al.
Published: (2024)
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark
by: Bsharat, Sondos Mahmoud, et al.
Published: (2025)
by: Bsharat, Sondos Mahmoud, et al.
Published: (2025)
Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
by: Fu, Tairan, et al.
Published: (2026)
by: Fu, Tairan, et al.
Published: (2026)
DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models
by: Altakrori, Malik H., et al.
Published: (2025)
by: Altakrori, Malik H., et al.
Published: (2025)
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
by: Ghahroodi, Omid, et al.
Published: (2024)
by: Ghahroodi, Omid, et al.
Published: (2024)
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
by: KJ, Sankalp, et al.
Published: (2025)
by: KJ, Sankalp, et al.
Published: (2025)
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
by: Zhao, Qihao, et al.
Published: (2024)
by: Zhao, Qihao, et al.
Published: (2024)
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models
by: Wang, Wentian, et al.
Published: (2024)
by: Wang, Wentian, et al.
Published: (2024)
Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
Assessing Latency in ASR Systems: A Methodological Perspective for Real-Time Use
by: Arriaga, Carlos, et al.
Published: (2024)
by: Arriaga, Carlos, et al.
Published: (2024)
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
by: Dunham, TJ, et al.
Published: (2024)
by: Dunham, TJ, et al.
Published: (2024)
Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan
by: Chatterjee, Ahan, et al.
Published: (2026)
by: Chatterjee, Ahan, et al.
Published: (2026)
Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
by: Zhao, Juntu, et al.
Published: (2024)
by: Zhao, Juntu, et al.
Published: (2024)
Lost in Translation: The Algorithmic Gap Between LMs and the Brain
by: Tosato, Tommaso, et al.
Published: (2024)
by: Tosato, Tommaso, et al.
Published: (2024)
Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms
by: Shukla, Vaibhav, et al.
Published: (2026)
by: Shukla, Vaibhav, et al.
Published: (2026)
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
by: Zhao, Justin, et al.
Published: (2024)
by: Zhao, Justin, et al.
Published: (2024)
Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
by: Huang, Xu, et al.
Published: (2024)
by: Huang, Xu, et al.
Published: (2024)
Real-time Spatial Retrieval Augmented Generation for Urban Environments
by: Campo, David Nazareno, et al.
Published: (2025)
by: Campo, David Nazareno, et al.
Published: (2025)
Understanding the Impact of Artificial Intelligence in Academic Writing: Metadata to the Rescue
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News
by: López, Alejandro Buitrago, et al.
Published: (2026)
by: López, Alejandro Buitrago, et al.
Published: (2026)
Beyond Reproducibility: Token Probabilities Expose Large Language Model Nondeterminism
by: Fu, Tairan, et al.
Published: (2026)
by: Fu, Tairan, et al.
Published: (2026)
Training language models to be warm and empathetic makes them less reliable and more sycophantic
by: Ibrahim, Lujain, et al.
Published: (2025)
by: Ibrahim, Lujain, et al.
Published: (2025)
Energy-Efficient Stochastic Computing (SC) Neural Networks for Internet of Things Devices With Layer-Wise Adjustable Sequence Length (ASL)
by: Wang, Ziheng, et al.
Published: (2025)
by: Wang, Ziheng, et al.
Published: (2025)
Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study
by: Mosquera, Manuel, et al.
Published: (2025)
by: Mosquera, Manuel, et al.
Published: (2025)
Improving LLM Abilities in Idiomatic Translation
by: Donthi, Sundesh, et al.
Published: (2024)
by: Donthi, Sundesh, et al.
Published: (2024)
How and Where to Translate? The Impact of Translation Strategies in Cross-lingual LLM Prompting
by: Gupta, Aman, et al.
Published: (2025)
by: Gupta, Aman, et al.
Published: (2025)
Similar Items
-
It's the same but not the same: Do LLMs distinguish Spanish varieties?
by: Mayor-Rocher, Marina, et al.
Published: (2025) -
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024) -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025) -
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025) -
Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)