Saved in:
Bibliographic Details
Main Authors: Zeng, Peter, Li, Weiling, Paige, Amie, Wang, Zhengxiang, Kaliosis, Panagiotis, Samaras, Dimitris, Zelinsky, Gregory, Brennan, Susan, Rambow, Owen
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.19792
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914490719666176
author Zeng, Peter
Li, Weiling
Paige, Amie
Wang, Zhengxiang
Kaliosis, Panagiotis
Samaras, Dimitris
Zelinsky, Gregory
Brennan, Susan
Rambow, Owen
author_facet Zeng, Peter
Li, Weiling
Paige, Amie
Wang, Zhengxiang
Kaliosis, Panagiotis
Samaras, Dimitris
Zelinsky, Gregory
Brennan, Susan
Rambow, Owen
contents For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use. We release our corpus of 356 dialogues (89 pairs over 4 rounds each) along with the online pipeline for data collection and the tools for analyzing accuracy, efficiency, and lexical overlap.
format Preprint
id arxiv_https___arxiv_org_abs_2601_19792
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle LVLMs and Humans Ground Differently in Referential Communication
Zeng, Peter
Li, Weiling
Paige, Amie
Wang, Zhengxiang
Kaliosis, Panagiotis
Samaras, Dimitris
Zelinsky, Gregory
Brennan, Susan
Rambow, Owen
Computation and Language
Artificial Intelligence
Human-Computer Interaction
For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use. We release our corpus of 356 dialogues (89 pairs over 4 rounds each) along with the online pipeline for data collection and the tools for analyzing accuracy, efficiency, and lexical overlap.
title LVLMs and Humans Ground Differently in Referential Communication
topic Computation and Language
Artificial Intelligence
Human-Computer Interaction
url https://arxiv.org/abs/2601.19792