Saved in:
Bibliographic Details
Main Authors: Lee, Taeyeop, Kang, Gyuree, Wen, Bowen, Kim, Youngho, Back, Seunghyeok, Kweon, In So, Shim, David Hyunchul, Yoon, Kuk-Jin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.05662
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities. Although some methods have partially addressed these issues, most of them have limitations in generalization to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural language task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/