Saved in:
| Main Authors: | Fu, Tairan, Ferrando, Raquel, Conde, Javier, Arriaga, Carlos, Reviriego, Pedro |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.18626 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)
by: Ferrando, Raquel, et al.
Published: (2025)
Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)
by: Awad, Samer, et al.
Published: (2026)
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
by: Fu, Tairan, et al.
Published: (2026)
by: Fu, Tairan, et al.
Published: (2026)
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
by: Arriaga, Carlos, et al.
Published: (2025)
by: Arriaga, Carlos, et al.
Published: (2025)
To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times
by: Clark, Thomas Hikaru, et al.
Published: (2026)
by: Clark, Thomas Hikaru, et al.
Published: (2026)
Can ChatGPT Learn to Count Letters?
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Open Conversational LLMs do not know most Spanish words
by: Conde, Javier, et al.
Published: (2024)
by: Conde, Javier, et al.
Published: (2024)
Beyond Reproducibility: Token Probabilities Expose Large Language Model Nondeterminism
by: Fu, Tairan, et al.
Published: (2026)
by: Fu, Tairan, et al.
Published: (2026)
Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test
by: Fu, Tairan, et al.
Published: (2026)
by: Fu, Tairan, et al.
Published: (2026)
Speed and Conversational Large Language Models: Not All Is About Tokens per Second
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
by: Mayor-Rocher, Marina, et al.
Published: (2024)
by: Mayor-Rocher, Marina, et al.
Published: (2024)
Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
by: Martínez, Gonzalo, et al.
Published: (2024)
by: Martínez, Gonzalo, et al.
Published: (2024)
Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)
by: Zhu, Jinhua, et al.
Published: (2024)
How does fine-tuning improve sensorimotor representations in large language models?
by: Wu, Minghua, et al.
Published: (2026)
by: Wu, Minghua, et al.
Published: (2026)
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
by: Martínez, Gonzalo, et al.
Published: (2023)
by: Martínez, Gonzalo, et al.
Published: (2023)
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
It's the same but not the same: Do LLMs distinguish Spanish varieties?
by: Mayor-Rocher, Marina, et al.
Published: (2025)
by: Mayor-Rocher, Marina, et al.
Published: (2025)
Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal
by: Martínez, Gonzalo, et al.
Published: (2024)
by: Martínez, Gonzalo, et al.
Published: (2024)
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)
by: Reviriego, Pedro, et al.
Published: (2023)
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
by: Sobotka, Jan, et al.
Published: (2026)
by: Sobotka, Jan, et al.
Published: (2026)
Information Flow Routes: Automatically Interpreting Language Models at Scale
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
Assessing Latency in ASR Systems: A Methodological Perspective for Real-Time Use
by: Arriaga, Carlos, et al.
Published: (2024)
by: Arriaga, Carlos, et al.
Published: (2024)
Automated Interpretability and Feature Discovery in Language Models with Agents
by: Marin-Llobet, Arnau, et al.
Published: (2026)
by: Marin-Llobet, Arnau, et al.
Published: (2026)
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
by: Zhu, Yuqi, et al.
Published: (2025)
by: Zhu, Yuqi, et al.
Published: (2025)
Spanish and LLM Benchmarks: is MMLU Lost in Translation?
by: Plaza, Irene, et al.
Published: (2024)
by: Plaza, Irene, et al.
Published: (2024)
Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
by: Li, Pengxiang, et al.
Published: (2026)
by: Li, Pengxiang, et al.
Published: (2026)
Large Language Models Struggle with Unreasonability in Math Problems
by: Ma, Jingyuan, et al.
Published: (2024)
by: Ma, Jingyuan, et al.
Published: (2024)
Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes
by: Chen, Liuliu, et al.
Published: (2026)
by: Chen, Liuliu, et al.
Published: (2026)
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
by: Tufanov, Igor, et al.
Published: (2024)
by: Tufanov, Igor, et al.
Published: (2024)
Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement Task
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
Artificial Intelligence and Misinformation in Art: Can Vision Language Models Judge the Hand or the Machine Behind the Canvas?
by: Fu, Tarian, et al.
Published: (2025)
by: Fu, Tarian, et al.
Published: (2025)
Gender Inequality in English Textbooks Around the World: an NLP Approach
by: Liu, Tairan
Published: (2025)
by: Liu, Tairan
Published: (2025)
A Primer on the Inner Workings of Transformer-based Language Models
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
The LLM Bottleneck: Why Open-Source Vision LLMs Struggle with Hierarchical Visual Recognition
by: Tan, Yuwen, et al.
Published: (2025)
by: Tan, Yuwen, et al.
Published: (2025)
Similar Items
-
Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025) -
Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026) -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025) -
Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
by: Fu, Tairan, et al.
Published: (2025) -
Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
by: Fu, Tairan, et al.
Published: (2026)