Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gonzalez, Antonio Galiza Cerdeira, Gajewski, Paweł, Indurkhya, Bipin
Format:	Preprint
Published:	2024
Subjects:	Robotics Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.06355
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917472509100032
author	Gonzalez, Antonio Galiza Cerdeira Gajewski, Paweł Indurkhya, Bipin
author_facet	Gonzalez, Antonio Galiza Cerdeira Gajewski, Paweł Indurkhya, Bipin
contents	This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using foundational and task-specific deep learning models, it allows out-of-the-box speech recognition, natural language understanding, gesture detection, and object segmentation. The modular architecture enhances transparency and explainability by explicitly parsing commands into object-action-target representations, enabling integration with symbolic robotic frameworks. We demonstrate the system in a TIAGo++ robot and provide an evaluation on a real-world data set of human-robot interaction scenarios; achieving an 82.39\% success rate over our benchmark data set, highlighting the robustness of the system to diversity, noise, and communication ambiguity. The data set, evaluation scenarios, and the code are publicly available to support future research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_06355
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios Gonzalez, Antonio Galiza Cerdeira Gajewski, Paweł Indurkhya, Bipin Robotics Artificial Intelligence This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using foundational and task-specific deep learning models, it allows out-of-the-box speech recognition, natural language understanding, gesture detection, and object segmentation. The modular architecture enhances transparency and explainability by explicitly parsing commands into object-action-target representations, enabling integration with symbolic robotic frameworks. We demonstrate the system in a TIAGo++ robot and provide an evaluation on a real-world data set of human-robot interaction scenarios; achieving an 82.39\% success rate over our benchmark data set, highlighting the robustness of the system to diversity, noise, and communication ambiguity. The data set, evaluation scenarios, and the code are publicly available to support future research.
title	UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios
topic	Robotics Artificial Intelligence
url	https://arxiv.org/abs/2410.06355

Similar Items