Saved in:
| Main Authors: | Huang, Yunchong, Barlacchi, Gianni, Pezzelle, Sandro |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.11938 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!
by: Wildenburg, Frank, et al.
Published: (2024)
by: Wildenburg, Frank, et al.
Published: (2024)
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
by: Surikuchi, Aditya K, et al.
Published: (2025)
by: Surikuchi, Aditya K, et al.
Published: (2025)
Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models
by: Nakajima, Kumiko, et al.
Published: (2026)
by: Nakajima, Kumiko, et al.
Published: (2026)
How Language Models Conflate Logical Validity with Plausibility: A Representational Analysis of Content Effects
by: Bertolazzi, Leonardo, et al.
Published: (2025)
by: Bertolazzi, Leonardo, et al.
Published: (2025)
They want to pretend not to understand: The Limits of Current LLMs in Interpreting Implicit Content of Political Discourse
by: Paci, Walter, et al.
Published: (2025)
by: Paci, Walter, et al.
Published: (2025)
Naming, Describing, and Quantifying Visual Objects in Humans and LLMs
by: Testoni, Alberto, et al.
Published: (2024)
by: Testoni, Alberto, et al.
Published: (2024)
Are formal and functional linguistic mechanisms dissociated in language models?
by: Hanna, Michael, et al.
Published: (2025)
by: Hanna, Michael, et al.
Published: (2025)
Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes
by: Takmaz, Ece, et al.
Published: (2024)
by: Takmaz, Ece, et al.
Published: (2024)
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
by: Chen, Xinyi, et al.
Published: (2023)
by: Chen, Xinyi, et al.
Published: (2023)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)
by: Hanna, Michael, et al.
Published: (2024)
Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments
by: Surikuchi, Aditya K, et al.
Published: (2026)
by: Surikuchi, Aditya K, et al.
Published: (2026)
ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification
by: Wang, Zhensheng, et al.
Published: (2026)
by: Wang, Zhensheng, et al.
Published: (2026)
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
by: Surikuchi, Aditya K, et al.
Published: (2024)
by: Surikuchi, Aditya K, et al.
Published: (2024)
Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
by: Malaviya, Chaitanya, et al.
Published: (2024)
by: Malaviya, Chaitanya, et al.
Published: (2024)
Is my model perplexed for the right reason? Contrasting LLMs' Benchmark Behavior with Token-Level Perplexity
by: Prins, Zoë, et al.
Published: (2026)
by: Prins, Zoë, et al.
Published: (2026)
Vision-Language Models Align with Human Neural Representations in Concept Processing
by: Bavaresco, Anna, et al.
Published: (2024)
by: Bavaresco, Anna, et al.
Published: (2024)
Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA
by: Roy, Nirmal, et al.
Published: (2024)
by: Roy, Nirmal, et al.
Published: (2024)
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
by: Pletenev, Sergey, et al.
Published: (2025)
by: Pletenev, Sergey, et al.
Published: (2025)
QA-prompting: Improving Summarization with Large Language Models using Question-Answering
by: Sinha, Neelabh
Published: (2025)
by: Sinha, Neelabh
Published: (2025)
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
by: Li, Jiaang, et al.
Published: (2026)
by: Li, Jiaang, et al.
Published: (2026)
Improving QA Model Performance with Cartographic Inoculation
by: Chen, Allen, et al.
Published: (2024)
by: Chen, Allen, et al.
Published: (2024)
Context Selection and Rewriting for Video-based Educational Question Generation
by: Yu, Mengxia, et al.
Published: (2025)
by: Yu, Mengxia, et al.
Published: (2025)
PolQA: Polish Question Answering Dataset
by: Rybak, Piotr, et al.
Published: (2022)
by: Rybak, Piotr, et al.
Published: (2022)
Syn-QA2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets
by: Daswani, Ashwin, et al.
Published: (2024)
by: Daswani, Ashwin, et al.
Published: (2024)
CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering
by: Wu, Yike, et al.
Published: (2024)
by: Wu, Yike, et al.
Published: (2024)
Socratic Reasoning Improves Positive Text Rewriting
by: Goel, Anmol, et al.
Published: (2024)
by: Goel, Anmol, et al.
Published: (2024)
DebateQA: Evaluating Question Answering on Debatable Knowledge
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting
by: Shields-Menard, Sara, et al.
Published: (2025)
by: Shields-Menard, Sara, et al.
Published: (2025)
RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
by: Qiu, Weikang, et al.
Published: (2025)
by: Qiu, Weikang, et al.
Published: (2025)
EEE-QA: Exploring Effective and Efficient Question-Answer Representations
by: Hu, Zhanghao, et al.
Published: (2024)
by: Hu, Zhanghao, et al.
Published: (2024)
M2QA: Multi-domain Multilingual Question Answering
by: Engländer, Leon, et al.
Published: (2024)
by: Engländer, Leon, et al.
Published: (2024)
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
by: Pahilajani, Anish, et al.
Published: (2024)
by: Pahilajani, Anish, et al.
Published: (2024)
Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
by: Hou, Yutao, et al.
Published: (2024)
by: Hou, Yutao, et al.
Published: (2024)
Towards Better Question Generation in QA-based Event Extraction
by: Hong, Zijin, et al.
Published: (2024)
by: Hong, Zijin, et al.
Published: (2024)
MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering
by: Guan, Che, et al.
Published: (2024)
by: Guan, Che, et al.
Published: (2024)
FoQA: A Faroese Question-Answering Dataset
by: Simonsen, Annika, et al.
Published: (2025)
by: Simonsen, Annika, et al.
Published: (2025)
ExpertQA: Expert-Curated Questions and Attributed Answers
by: Malaviya, Chaitanya, et al.
Published: (2023)
by: Malaviya, Chaitanya, et al.
Published: (2023)
RetinaQA: A Robust Knowledge Base Question Answering Model for both Answerable and Unanswerable Questions
by: Faldu, Prayushi, et al.
Published: (2024)
by: Faldu, Prayushi, et al.
Published: (2024)
RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering
by: Hudspeth, Marisa, et al.
Published: (2026)
by: Hudspeth, Marisa, et al.
Published: (2026)
DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards
by: Kartha, Aaryaman, et al.
Published: (2025)
by: Kartha, Aaryaman, et al.
Published: (2025)
Similar Items
-
Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!
by: Wildenburg, Frank, et al.
Published: (2024) -
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
by: Surikuchi, Aditya K, et al.
Published: (2025) -
Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models
by: Nakajima, Kumiko, et al.
Published: (2026) -
How Language Models Conflate Logical Validity with Plausibility: A Representational Analysis of Content Effects
by: Bertolazzi, Leonardo, et al.
Published: (2025) -
They want to pretend not to understand: The Limits of Current LLMs in Interpreting Implicit Content of Political Discourse
by: Paci, Walter, et al.
Published: (2025)