Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bila, Natalia, Naszádi, Kata, Mayn, Alexandra, Monz, Christof
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.19997
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917354824269824
author	Bila, Natalia Naszádi, Kata Mayn, Alexandra Monz, Christof
author_facet	Bila, Natalia Naszádi, Kata Mayn, Alexandra Monz, Christof
contents	We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual inferences. Building on an existing two-speaker psycholinguistic paradigm -- which contrasts a pragmatically cooperative speaker with one who is only literally reliable -- we introduce Build What I Mean (BWIM), an interactive benchmark for contextual meaning construction. In BWIM, models must resolve ambiguity by either performing a contextual inference or requesting clarification at a small communication cost. Evaluating several state-of-the-art LLMs, we find a dissociation between judgment and action: while models detect speaker unreliability in explicit confidence ratings, they fail to exploit this information to guide efficient clarification behavior. Instead, we observe suboptimal strategies, such as partner-blind over-clarification and question-averse guessing under uncertainty.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_19997
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	When Contextual Inference Fails: Cancelability in Interactive Instruction Following Bila, Natalia Naszádi, Kata Mayn, Alexandra Monz, Christof Computation and Language We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual inferences. Building on an existing two-speaker psycholinguistic paradigm -- which contrasts a pragmatically cooperative speaker with one who is only literally reliable -- we introduce Build What I Mean (BWIM), an interactive benchmark for contextual meaning construction. In BWIM, models must resolve ambiguity by either performing a contextual inference or requesting clarification at a small communication cost. Evaluating several state-of-the-art LLMs, we find a dissociation between judgment and action: while models detect speaker unreliability in explicit confidence ratings, they fail to exploit this information to guide efficient clarification behavior. Instead, we observe suboptimal strategies, such as partner-blind over-clarification and question-averse guessing under uncertainty.
title	When Contextual Inference Fails: Cancelability in Interactive Instruction Following
topic	Computation and Language
url	https://arxiv.org/abs/2603.19997

Similar Items