Saved in:
Bibliographic Details
Main Authors: Zhang, Zhemeng, Ma, Jiahua, Yang, Xincheng, Wen, Xin, Zhang, Yuzhi, Li, Boyan, Qin, Yiran, Liu, Jin, Zhao, Can, Kang, Li, Hong, Haoqin, Yin, Zhenfei, Torr, Philip, Su, Hao, Zhang, Ruimao, Ma, Daolin
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.20239
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909038781923328
author Zhang, Zhemeng
Ma, Jiahua
Yang, Xincheng
Wen, Xin
Zhang, Yuzhi
Li, Boyan
Qin, Yiran
Liu, Jin
Zhao, Can
Kang, Li
Hong, Haoqin
Yin, Zhenfei
Torr, Philip
Su, Hao
Zhang, Ruimao
Ma, Daolin
author_facet Zhang, Zhemeng
Ma, Jiahua
Yang, Xincheng
Wen, Xin
Zhang, Yuzhi
Li, Boyan
Qin, Yiran
Liu, Jin
Zhao, Can
Kang, Li
Hong, Haoqin
Yin, Zhenfei
Torr, Philip
Su, Hao
Zhang, Ruimao
Ma, Daolin
contents Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
format Preprint
id arxiv_https___arxiv_org_abs_2601_20239
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Zhang, Zhemeng
Ma, Jiahua
Yang, Xincheng
Wen, Xin
Zhang, Yuzhi
Li, Boyan
Qin, Yiran
Liu, Jin
Zhao, Can
Kang, Li
Hong, Haoqin
Yin, Zhenfei
Torr, Philip
Su, Hao
Zhang, Ruimao
Ma, Daolin
Robotics
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
title TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
topic Robotics
url https://arxiv.org/abs/2601.20239