:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yunchong, Barlacchi, Gianni, Pezzelle, Sandro
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.11938
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!
by: Wildenburg, Frank, et al.
Published: (2024)

Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
by: Surikuchi, Aditya K, et al.
Published: (2025)

Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models
by: Nakajima, Kumiko, et al.
Published: (2026)

How Language Models Conflate Logical Validity with Plausibility: A Representational Analysis of Content Effects
by: Bertolazzi, Leonardo, et al.
Published: (2025)

They want to pretend not to understand: The Limits of Current LLMs in Interpreting Implicit Content of Political Discourse
by: Paci, Walter, et al.
Published: (2025)

Naming, Describing, and Quantifying Visual Objects in Humans and LLMs
by: Testoni, Alberto, et al.
Published: (2024)

Are formal and functional linguistic mechanisms dissociated in language models?
by: Hanna, Michael, et al.
Published: (2025)

Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes
by: Takmaz, Ece, et al.
Published: (2024)

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
by: Chen, Xinyi, et al.
Published: (2023)

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments
by: Surikuchi, Aditya K, et al.
Published: (2026)

ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification
by: Wang, Zhensheng, et al.
Published: (2026)

Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
by: Surikuchi, Aditya K, et al.
Published: (2024)

Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
by: Malaviya, Chaitanya, et al.
Published: (2024)

Is my model perplexed for the right reason? Contrasting LLMs' Benchmark Behavior with Token-Level Perplexity
by: Prins, Zoë, et al.
Published: (2026)

Vision-Language Models Align with Human Neural Representations in Concept Processing
by: Bavaresco, Anna, et al.
Published: (2024)

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA
by: Roy, Nirmal, et al.
Published: (2024)

Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
by: Pletenev, Sergey, et al.
Published: (2025)

QA-prompting: Improving Summarization with Large Language Models using Question-Answering
by: Sinha, Neelabh
Published: (2025)

Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
by: Li, Jiaang, et al.
Published: (2026)

Improving QA Model Performance with Cartographic Inoculation
by: Chen, Allen, et al.
Published: (2024)

Context Selection and Rewriting for Video-based Educational Question Generation
by: Yu, Mengxia, et al.
Published: (2025)

PolQA: Polish Question Answering Dataset
by: Rybak, Piotr, et al.
Published: (2022)

Syn-QA2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets
by: Daswani, Ashwin, et al.
Published: (2024)

CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering
by: Wu, Yike, et al.
Published: (2024)

Socratic Reasoning Improves Positive Text Rewriting
by: Goel, Anmol, et al.
Published: (2024)

DebateQA: Evaluating Question Answering on Debatable Knowledge
by: Xu, Rongwu, et al.
Published: (2024)

UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting
by: Shields-Menard, Sara, et al.
Published: (2025)

RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
by: Qiu, Weikang, et al.
Published: (2025)

EEE-QA: Exploring Effective and Efficient Question-Answer Representations
by: Hu, Zhanghao, et al.
Published: (2024)

M2QA: Multi-domain Multilingual Question Answering
by: Engländer, Leon, et al.
Published: (2024)

GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
by: Pahilajani, Anish, et al.
Published: (2024)

Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
by: Hou, Yutao, et al.
Published: (2024)

Towards Better Question Generation in QA-based Event Extraction
by: Hong, Zijin, et al.
Published: (2024)

MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering
by: Guan, Che, et al.
Published: (2024)

FoQA: A Faroese Question-Answering Dataset
by: Simonsen, Annika, et al.
Published: (2025)

ExpertQA: Expert-Curated Questions and Attributed Answers
by: Malaviya, Chaitanya, et al.
Published: (2023)

RetinaQA: A Robust Knowledge Base Question Answering Model for both Answerable and Unanswerable Questions
by: Faldu, Prayushi, et al.
Published: (2024)

RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering
by: Hudspeth, Marisa, et al.
Published: (2026)

DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards
by: Kartha, Aaryaman, et al.
Published: (2025)