Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Burghouts, Gertjan, Schaaphok, Marianne, van Bekkum, Michael, Meijer, Wouter, Hillerström, Fieke, van Mil, Jelle
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.13368
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916329431236608
author	Burghouts, Gertjan Schaaphok, Marianne van Bekkum, Michael Meijer, Wouter Hillerström, Fieke van Mil, Jelle
author_facet	Burghouts, Gertjan Schaaphok, Marianne van Bekkum, Michael Meijer, Wouter Hillerström, Fieke van Mil, Jelle
contents	Mobile robot platforms will increasingly be tasked with activities that involve grasping and manipulating objects in open world environments. Affordance understanding provides a robot with means to realise its goals and execute its tasks, e.g. to achieve autonomous navigation in unknown buildings where it has to find doors and ways to open these. In order to get actionable suggestions, robots need to be able to distinguish subtle differences between objects, as they may result in different action sequences: doorknobs require grasp and twist, while handlebars require grasp and push. In this paper, we improve affordance perception for a robot in an open-world setting. Our contribution is threefold: (1) We provide an affordance representation with precise, actionable affordances; (2) We connect this knowledge base to a foundational vision-language models (VLM) and prompt the VLM for a wider variety of new and unseen objects; (3) We apply a human-in-the-loop for corrections on the output of the VLM. The mix of affordance representation, image detection and a human-in-the-loop is effective for a robot to search for objects to achieve its goals. We have demonstrated this in a scenario of finding various doors and the many different ways to open them.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_13368
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction Burghouts, Gertjan Schaaphok, Marianne van Bekkum, Michael Meijer, Wouter Hillerström, Fieke van Mil, Jelle Computer Vision and Pattern Recognition Mobile robot platforms will increasingly be tasked with activities that involve grasping and manipulating objects in open world environments. Affordance understanding provides a robot with means to realise its goals and execute its tasks, e.g. to achieve autonomous navigation in unknown buildings where it has to find doors and ways to open these. In order to get actionable suggestions, robots need to be able to distinguish subtle differences between objects, as they may result in different action sequences: doorknobs require grasp and twist, while handlebars require grasp and push. In this paper, we improve affordance perception for a robot in an open-world setting. Our contribution is threefold: (1) We provide an affordance representation with precise, actionable affordances; (2) We connect this knowledge base to a foundational vision-language models (VLM) and prompt the VLM for a wider variety of new and unseen objects; (3) We apply a human-in-the-loop for corrections on the output of the VLM. The mix of affordance representation, image detection and a human-in-the-loop is effective for a robot to search for objects to achieve its goals. We have demonstrated this in a scenario of finding various doors and the many different ways to open them.
title	Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2407.13368

Similar Items