Saved in:
Bibliographic Details
Main Authors: Newman, Benjamin A., Gupta, Pranay, Kitani, Kris, Bisk, Yonatan, Admoni, Henny, Paxton, Chris
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.08876
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910523949318144
author Newman, Benjamin A.
Gupta, Pranay
Kitani, Kris
Bisk, Yonatan
Admoni, Henny
Paxton, Chris
author_facet Newman, Benjamin A.
Gupta, Pranay
Kitani, Kris
Bisk, Yonatan
Admoni, Henny
Paxton, Chris
contents De gustibus non est disputandum ("there is no accounting for others' tastes") is a common Latin maxim describing how many solutions in life are determined by people's personal preferences. Many household tasks, in particular, can only be considered fully successful when they account for personal preferences such as the visual aesthetic of the scene. For example, setting a table could be optimized by arranging utensils according to traditional rules of Western table setting decorum, without considering the color, shape, or material of each object, but this may not be a completely satisfying solution for a given person. Toward this end, we present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. To do this, we use internet-scale pre-trained vision-and-language foundation models (VLMs) with novel zero-shot visual prompting techniques. To evaluate our method, we collect a large dataset of naturalistic personal preferences in a simulated table-setting task, and conduct a user study in order to develop two novel metrics for determining success based on personal preference. This is a challenging problem and we find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.
format Preprint
id arxiv_https___arxiv_org_abs_2407_08876
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement
Newman, Benjamin A.
Gupta, Pranay
Kitani, Kris
Bisk, Yonatan
Admoni, Henny
Paxton, Chris
Computer Vision and Pattern Recognition
Robotics
De gustibus non est disputandum ("there is no accounting for others' tastes") is a common Latin maxim describing how many solutions in life are determined by people's personal preferences. Many household tasks, in particular, can only be considered fully successful when they account for personal preferences such as the visual aesthetic of the scene. For example, setting a table could be optimized by arranging utensils according to traditional rules of Western table setting decorum, without considering the color, shape, or material of each object, but this may not be a completely satisfying solution for a given person. Toward this end, we present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. To do this, we use internet-scale pre-trained vision-and-language foundation models (VLMs) with novel zero-shot visual prompting techniques. To evaluate our method, we collect a large dataset of naturalistic personal preferences in a simulated table-setting task, and conduct a user study in order to develop two novel metrics for determining success based on personal preference. This is a challenging problem and we find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.
title DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement
topic Computer Vision and Pattern Recognition
Robotics
url https://arxiv.org/abs/2407.08876