:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Plaza, Irene, Melero, Nina, del Pozo, Cristina, Conde, Javier, Reviriego, Pedro, Mayor-Rocher, Marina, Grandury, María
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.17789
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

It's the same but not the same: Do LLMs distinguish Spanish varieties?
by: Mayor-Rocher, Marina, et al.
Published: (2025)

Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024)

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)

Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
by: Arriaga, Carlos, et al.
Published: (2025)

How does fine-tuning improve sensorimotor representations in large language models?
by: Wu, Minghua, et al.
Published: (2026)

Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)

Can ChatGPT Learn to Count Letters?
by: Conde, Javier, et al.
Published: (2025)

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)

Speed and Conversational Large Language Models: Not All Is About Tokens per Second
by: Conde, Javier, et al.
Published: (2025)

Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)

Are We Done with MMLU?
by: Gema, Aryo Pradipta, et al.
Published: (2024)

Open Conversational LLMs do not know most Spanish words
by: Conde, Javier, et al.
Published: (2024)

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark
by: Bsharat, Sondos Mahmoud, et al.
Published: (2025)

Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
by: Fu, Tairan, et al.
Published: (2026)

DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models
by: Altakrori, Malik H., et al.
Published: (2025)

Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
by: Ghahroodi, Omid, et al.
Published: (2024)

IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
by: KJ, Sankalp, et al.
Published: (2025)

MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
by: Zhao, Qihao, et al.
Published: (2024)

MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models
by: Wang, Wentian, et al.
Published: (2024)

Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities
by: Fu, Tairan, et al.
Published: (2025)

Assessing Latency in ASR Systems: A Methodological Perspective for Real-Time Use
by: Arriaga, Carlos, et al.
Published: (2024)

Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
by: Dunham, TJ, et al.
Published: (2024)

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan
by: Chatterjee, Ahan, et al.
Published: (2026)

Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
by: Zhao, Juntu, et al.
Published: (2024)

Lost in Translation: The Algorithmic Gap Between LMs and the Brain
by: Tosato, Tommaso, et al.
Published: (2024)

Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms
by: Shukla, Vaibhav, et al.
Published: (2026)

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
by: Zhao, Justin, et al.
Published: (2024)

Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
by: Huang, Xu, et al.
Published: (2024)

Real-time Spatial Retrieval Augmented Generation for Urban Environments
by: Campo, David Nazareno, et al.
Published: (2025)

Understanding the Impact of Artificial Intelligence in Academic Writing: Metadata to the Rescue
by: Conde, Javier, et al.
Published: (2025)

Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)

Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News
by: López, Alejandro Buitrago, et al.
Published: (2026)

Beyond Reproducibility: Token Probabilities Expose Large Language Model Nondeterminism
by: Fu, Tairan, et al.
Published: (2026)

Training language models to be warm and empathetic makes them less reliable and more sycophantic
by: Ibrahim, Lujain, et al.
Published: (2025)

Energy-Efficient Stochastic Computing (SC) Neural Networks for Internet of Things Devices With Layer-Wise Adjustable Sequence Length (ASL)
by: Wang, Ziheng, et al.
Published: (2025)

Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study
by: Mosquera, Manuel, et al.
Published: (2025)

Improving LLM Abilities in Idiomatic Translation
by: Donthi, Sundesh, et al.
Published: (2024)

How and Where to Translate? The Impact of Translation Strategies in Cross-lingual LLM Prompting
by: Gupta, Aman, et al.
Published: (2025)