:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Martínez-Murillo, Ivan, Lloret, Elena, Moreda, Paloma, Gatt, Albert
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2509.06401
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring the Influence of Relevant Knowledge for Natural Language Generation Interpretability
by: Martínez-Murillo, Iván, et al.
Published: (2025)

It's the same but not the same: Do LLMs distinguish Spanish varieties?
by: Mayor-Rocher, Marina, et al.
Published: (2025)

Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning
by: Schaffelder, Max, et al.
Published: (2025)

Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
by: Song, Yingjin, et al.
Published: (2025)

Morphological Analysis for the Maltese Language: The Challenges of a Hybrid System
by: Borg, Claudia, et al.
Published: (2017)

A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences
by: Bertolazzi, Leonardo, et al.
Published: (2024)

CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
by: Beňová, Ivana, et al.
Published: (2024)

Probing Omissions and Distortions in Transformer-based RDF-to-Text Models
by: Faille, Juliette, et al.
Published: (2024)

Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
by: Derner, Erik, et al.
Published: (2024)

From Image Captioning to Visual Storytelling
by: Passadakis, Admitos, et al.
Published: (2025)

Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning
by: Song, Yingjin, et al.
Published: (2024)

Grounded Misunderstandings in Asymmetric Dialogue: A Perspectivist Annotation Scheme for MapTask
by: Li, Nan, et al.
Published: (2025)

Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
by: Lu, Dongxu, et al.
Published: (2025)

Contrast Is All You Need
by: Kilic, Burak, et al.
Published: (2023)

FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
by: Du, Yupei, et al.
Published: (2023)

How and where does CLIP process negation?
by: Quantmeyer, Vincent, et al.
Published: (2024)

VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
by: Wong, Hugh Mee, et al.
Published: (2025)

When Models Decide and When They Bind: A Two-Stage Computation for Multiple-Choice Question-Answering
by: Wong, Hugh Mee, et al.
Published: (2026)

Beyond Generative Artificial Intelligence: Roadmap for Natural Language Generation
by: Maestre, María Miró, et al.
Published: (2024)

Don't Learn, Ground: A Case for Natural Language Inference with Visual Grounding
by: Ignatev, Daniil, et al.
Published: (2025)

Common Objects Out of Context (COOCo): Investigating Multimodal Context and Semantic Scene Violations in Referential Communication
by: Merlo, Filippo, et al.
Published: (2025)

Do LLMs exhibit human-like response biases? A case study in survey design
by: Tjuatja, Lindia, et al.
Published: (2023)

Summarizing long regulatory documents with a multi-step pipeline
by: Sie, Mika, et al.
Published: (2024)

Do Multilingual LLMs have specialized language heads?
by: Naufil, Muhammad
Published: (2026)

Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models
by: Salnikov, Mikhail, et al.
Published: (2025)

A Human-in/on-the-Loop Framework for Accessible Text Generation
by: Moreno, Lourdes, et al.
Published: (2026)

MedMobile: A mobile-sized language model with clinical capabilities
by: Vishwanath, Krithik, et al.
Published: (2024)

Predict the Next Word: Humans exhibit uncertainty in this task and language models _____
by: Ilia, Evgenia, et al.
Published: (2024)

Auxiliary task demands mask the capabilities of smaller language models
by: Hu, Jennifer, et al.
Published: (2024)

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
by: Bavaresco, Anna, et al.
Published: (2024)

References Matter: Investigating the Impact of Reference Set Variation on Summarization Evaluation
by: Casola, Silvia, et al.
Published: (2025)

Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs
by: Watson, Julia, et al.
Published: (2024)

Different types of syntactic agreement recruit the same units within large language models
by: Kryvosheieva, Daria, et al.
Published: (2025)

AIDBench: A benchmark for evaluating the authorship identification capability of large language models
by: Wen, Zichen, et al.
Published: (2024)

ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian
by: Syromiatnikov, Mykyta, et al.
Published: (2025)

A large-scale evaluation of commonsense knowledge in humans and large language models
by: Nguyen, Tuan Dung, et al.
Published: (2025)

Disentangling the Roles of Representation and Selection in Data Pruning
by: Du, Yupei, et al.
Published: (2025)

Text Difficulty Study: Do machines behave the same as humans regarding text difficulty?
by: Chen, Bowen, et al.
Published: (2022)

Do large language models resemble humans in language use?
by: Cai, Zhenguang G., et al.
Published: (2023)

Humans overrely on overconfident language models, across languages
by: Rathi, Neil, et al.
Published: (2025)