:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mayor-Rocher, Marina, Pozo, Cristina, Melero, Nina, Martínez, Gonzalo, Grandury, María, Reviriego, Pedro
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.20049
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spanish and LLM Benchmarks: is MMLU Lost in Translation?
by: Plaza, Irene, et al.
Published: (2024)

Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024)

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)

Open Conversational LLMs do not know most Spanish words
by: Conde, Javier, et al.
Published: (2024)

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)

Do LLMs exhibit the same commonsense capabilities across languages?
by: Martínez-Murillo, Ivan, et al.
Published: (2025)

Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)

Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
by: Martínez, Gonzalo, et al.
Published: (2024)

Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
by: Fu, Tairan, et al.
Published: (2026)

The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain
by: Grandury, María
Published: (2024)

Why Do Large Language Models (LLMs) Struggle to Count Letters?
by: Fu, Tairan, et al.
Published: (2024)

Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)

Text Difficulty Study: Do machines behave the same as humans regarding text difficulty?
by: Chen, Bowen, et al.
Published: (2022)

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
by: Arriaga, Carlos, et al.
Published: (2025)

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times
by: Clark, Thomas Hikaru, et al.
Published: (2026)

LLMs can hide text in other text of the same length
by: Norelli, Antonio, et al.
Published: (2025)

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America
by: Grandury, María, et al.
Published: (2025)

Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal
by: Martínez, Gonzalo, et al.
Published: (2024)

Can ChatGPT Learn to Count Letters?
by: Conde, Javier, et al.
Published: (2025)

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)

Does Burrows' Delta really confirm that Rowling and Galbraith are the same author?
by: Orekhov, Boris
Published: (2024)

Different types of syntactic agreement recruit the same units within large language models
by: Kryvosheieva, Daria, et al.
Published: (2025)

Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
by: Martínez, Gonzalo, et al.
Published: (2023)

Verifying Graph Algorithms in Separation Logic: A Case for an Algebraic Approach (Extended Version)
by: Grandury, Marcos, et al.
Published: (2025)

Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)

How does fine-tuning improve sensorimotor representations in large language models?
by: Wu, Minghua, et al.
Published: (2026)

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance
by: Peter, Silvan David, et al.
Published: (2023)

Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation
by: Stewart, Ian, et al.
Published: (2024)

Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models
by: Hosseini-Kivanani, Nina
Published: (2026)

The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs
by: Sant, Aleix, et al.
Published: (2024)

Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
by: Fu, Tairan, et al.
Published: (2025)

Speed and Conversational Large Language Models: Not All Is About Tokens per Second
by: Conde, Javier, et al.
Published: (2025)

On convergence empirics: same evidence for Spanish regions
by: Ana Lamo
Published: (2000)

Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)

Training language models to be warm and empathetic makes them less reliable and more sycophantic
by: Ibrahim, Lujain, et al.
Published: (2025)

Into the crossfire: evaluating the use of a language model to crowdsource gun violence reports
by: Belisario, Adriano, et al.
Published: (2024)

Overview of ADoBo at IberLEF 2025: Automatic Detection of Anglicisms in Spanish
by: Alvarez-Mellado, Elena, et al.
Published: (2025)

Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory
by: Hafner, Franziska Sofia, et al.
Published: (2025)

Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring
by: Almasi, Mina, et al.
Published: (2025)

Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs
by: Kawasaki, Yoshifumi
Published: (2026)