Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zeng, Peter, Li, Weiling, Paige, Amie, Wang, Zhengxiang, Kaliosis, Panagiotis, Samaras, Dimitris, Zelinsky, Gregory, Brennan, Susan, Rambow, Owen
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2601.19792
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914490719666176
author	Zeng, Peter Li, Weiling Paige, Amie Wang, Zhengxiang Kaliosis, Panagiotis Samaras, Dimitris Zelinsky, Gregory Brennan, Susan Rambow, Owen
author_facet	Zeng, Peter Li, Weiling Paige, Amie Wang, Zhengxiang Kaliosis, Panagiotis Samaras, Dimitris Zelinsky, Gregory Brennan, Susan Rambow, Owen
contents	For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use. We release our corpus of 356 dialogues (89 pairs over 4 rounds each) along with the online pipeline for data collection and the tools for analyzing accuracy, efficiency, and lexical overlap.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_19792
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	LVLMs and Humans Ground Differently in Referential Communication Zeng, Peter Li, Weiling Paige, Amie Wang, Zhengxiang Kaliosis, Panagiotis Samaras, Dimitris Zelinsky, Gregory Brennan, Susan Rambow, Owen Computation and Language Artificial Intelligence Human-Computer Interaction For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use. We release our corpus of 356 dialogues (89 pairs over 4 rounds each) along with the online pipeline for data collection and the tools for analyzing accuracy, efficiency, and lexical overlap.
title	LVLMs and Humans Ground Differently in Referential Communication
topic	Computation and Language Artificial Intelligence Human-Computer Interaction
url	https://arxiv.org/abs/2601.19792

Similar Items