Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rudaz, Damien, Carreras, Barbara Nino, Merlino, Sara, Due, Brian L., Brown, Barry
Format:	Preprint
Published:	2026
Subjects:	Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2602.05671
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915777849851904
author	Rudaz, Damien Carreras, Barbara Nino Merlino, Sara Due, Brian L. Brown, Barry
author_facet	Rudaz, Damien Carreras, Barbara Nino Merlino, Sara Due, Brian L. Brown, Barry
contents	Does human-AI assistance unfold in the same way as human-human assistance? This research explores what can be learned from the expertise of blind individuals and sighted volunteers to inform the design of multimodal voice agents and address the enduring challenge of proactivity. Drawing on granular analysis of two representative fragments from a larger corpus, we contrast the practices co-produced by an experienced human remote sighted assistant and a blind participant-as they collaborate to find a stain on a blanket over the phone-with those achieved when the same participant worked with a multimodal voice agent on the same task, a few moments earlier. This comparison enables us to specify precisely which fundamental proactive practices the agent did not enact in situ. We conclude that, so long as multimodal voice agents cannot produce environmentally occasioned vision-based actions, they will lack a key resource relied upon by human remote sighted assistants.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_05671
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	(Computer) Vision in Action: Comparing Remote Sighted Assistance and a Multimodal Voice Agent in Inspection Sequences Rudaz, Damien Carreras, Barbara Nino Merlino, Sara Due, Brian L. Brown, Barry Human-Computer Interaction Does human-AI assistance unfold in the same way as human-human assistance? This research explores what can be learned from the expertise of blind individuals and sighted volunteers to inform the design of multimodal voice agents and address the enduring challenge of proactivity. Drawing on granular analysis of two representative fragments from a larger corpus, we contrast the practices co-produced by an experienced human remote sighted assistant and a blind participant-as they collaborate to find a stain on a blanket over the phone-with those achieved when the same participant worked with a multimodal voice agent on the same task, a few moments earlier. This comparison enables us to specify precisely which fundamental proactive practices the agent did not enact in situ. We conclude that, so long as multimodal voice agents cannot produce environmentally occasioned vision-based actions, they will lack a key resource relied upon by human remote sighted assistants.
title	(Computer) Vision in Action: Comparing Remote Sighted Assistance and a Multimodal Voice Agent in Inspection Sequences
topic	Human-Computer Interaction
url	https://arxiv.org/abs/2602.05671

Similar Items