Saved in:
| Main Authors: | Mayor-Rocher, Marina, Pozo, Cristina, Melero, Nina, Martínez, Gonzalo, Grandury, María, Reviriego, Pedro |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.20049 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spanish and LLM Benchmarks: is MMLU Lost in Translation?
by: Plaza, Irene, et al.
Published: (2024)
by: Plaza, Irene, et al.
Published: (2024)
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024)
by: Mayor-Rocher, Marina, et al.
Published: (2024)
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
Open Conversational LLMs do not know most Spanish words
by: Conde, Javier, et al.
Published: (2024)
by: Conde, Javier, et al.
Published: (2024)
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Do LLMs exhibit the same commonsense capabilities across languages?
by: Martínez-Murillo, Ivan, et al.
Published: (2025)
by: Martínez-Murillo, Ivan, et al.
Published: (2025)
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
by: Martínez, Gonzalo, et al.
Published: (2024)
by: Martínez, Gonzalo, et al.
Published: (2024)
Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
by: Fu, Tairan, et al.
Published: (2026)
by: Fu, Tairan, et al.
Published: (2026)
The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain
by: Grandury, María
Published: (2024)
by: Grandury, María
Published: (2024)
Why Do Large Language Models (LLMs) Struggle to Count Letters?
by: Fu, Tairan, et al.
Published: (2024)
by: Fu, Tairan, et al.
Published: (2024)
Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)
by: Ferrando, Raquel, et al.
Published: (2025)
Text Difficulty Study: Do machines behave the same as humans regarding text difficulty?
by: Chen, Bowen, et al.
Published: (2022)
by: Chen, Bowen, et al.
Published: (2022)
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
by: Arriaga, Carlos, et al.
Published: (2025)
by: Arriaga, Carlos, et al.
Published: (2025)
To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times
by: Clark, Thomas Hikaru, et al.
Published: (2026)
by: Clark, Thomas Hikaru, et al.
Published: (2026)
LLMs can hide text in other text of the same length
by: Norelli, Antonio, et al.
Published: (2025)
by: Norelli, Antonio, et al.
Published: (2025)
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America
by: Grandury, María, et al.
Published: (2025)
by: Grandury, María, et al.
Published: (2025)
Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal
by: Martínez, Gonzalo, et al.
Published: (2024)
by: Martínez, Gonzalo, et al.
Published: (2024)
Can ChatGPT Learn to Count Letters?
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)
by: Reviriego, Pedro, et al.
Published: (2023)
Does Burrows' Delta really confirm that Rowling and Galbraith are the same author?
by: Orekhov, Boris
Published: (2024)
by: Orekhov, Boris
Published: (2024)
Different types of syntactic agreement recruit the same units within large language models
by: Kryvosheieva, Daria, et al.
Published: (2025)
by: Kryvosheieva, Daria, et al.
Published: (2025)
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
by: Martínez, Gonzalo, et al.
Published: (2023)
by: Martínez, Gonzalo, et al.
Published: (2023)
Verifying Graph Algorithms in Separation Logic: A Case for an Algebraic Approach (Extended Version)
by: Grandury, Marcos, et al.
Published: (2025)
by: Grandury, Marcos, et al.
Published: (2025)
Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)
by: Awad, Samer, et al.
Published: (2026)
How does fine-tuning improve sensorimotor representations in large language models?
by: Wu, Minghua, et al.
Published: (2026)
by: Wu, Minghua, et al.
Published: (2026)
Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance
by: Peter, Silvan David, et al.
Published: (2023)
by: Peter, Silvan David, et al.
Published: (2023)
Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation
by: Stewart, Ian, et al.
Published: (2024)
by: Stewart, Ian, et al.
Published: (2024)
Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models
by: Hosseini-Kivanani, Nina
Published: (2026)
by: Hosseini-Kivanani, Nina
Published: (2026)
The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs
by: Sant, Aleix, et al.
Published: (2024)
by: Sant, Aleix, et al.
Published: (2024)
Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
Speed and Conversational Large Language Models: Not All Is About Tokens per Second
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
On convergence empirics: same evidence for Spanish regions
by: Ana Lamo
Published: (2000)
by: Ana Lamo
Published: (2000)
Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)
by: Zhu, Jinhua, et al.
Published: (2024)
Training language models to be warm and empathetic makes them less reliable and more sycophantic
by: Ibrahim, Lujain, et al.
Published: (2025)
by: Ibrahim, Lujain, et al.
Published: (2025)
Into the crossfire: evaluating the use of a language model to crowdsource gun violence reports
by: Belisario, Adriano, et al.
Published: (2024)
by: Belisario, Adriano, et al.
Published: (2024)
Overview of ADoBo at IberLEF 2025: Automatic Detection of Anglicisms in Spanish
by: Alvarez-Mellado, Elena, et al.
Published: (2025)
by: Alvarez-Mellado, Elena, et al.
Published: (2025)
Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory
by: Hafner, Franziska Sofia, et al.
Published: (2025)
by: Hafner, Franziska Sofia, et al.
Published: (2025)
Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring
by: Almasi, Mina, et al.
Published: (2025)
by: Almasi, Mina, et al.
Published: (2025)
Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs
by: Kawasaki, Yoshifumi
Published: (2026)
by: Kawasaki, Yoshifumi
Published: (2026)
Similar Items
-
Spanish and LLM Benchmarks: is MMLU Lost in Translation?
by: Plaza, Irene, et al.
Published: (2024) -
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024) -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025) -
Open Conversational LLMs do not know most Spanish words
by: Conde, Javier, et al.
Published: (2024) -
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)