Saved in:
Bibliographic Details
Main Authors: Chen, Baijun, Wan, Weijie, Chen, Tianxing, Guo, Xianda, Xu, Congsheng, Qi, Yuanyang, Zhang, Haojie, Wu, Longyan, Xu, Tianling, Li, Zixuan, Wu, Yizhe, Li, Rui, Yang, Xiaokang, Luo, Ping, Sui, Wei, Mu, Yao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.10093
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Robotic manipulation has seen rapid progress with vision-language-action (VLA) policies. However, visuo-tactile perception is critical for contact-rich manipulation, as tasks such as insertion are difficult to complete robustly using vision alone. At the same time, acquiring large-scale and reliable tactile data in the physical world remains costly and challenging, and the lack of a unified evaluation platform further limits policy learning and systematic analysis. To address these challenges, we propose UniVTAC, a simulation-based visuo-tactile data synthesis platform that supports three commonly used visuo-tactile sensors and enables scalable and controllable generation of informative contact interactions. Based on this platform, we introduce the UniVTAC Encoder, a visuo-tactile encoder trained on large-scale simulation-synthesized data with designed supervisory signals, providing tactile-centric visuo-tactile representations for downstream manipulation tasks. In addition, we present the UniVTAC Benchmark, which consists of eight representative visuo-tactile manipulation tasks for evaluating tactile-driven policies. Experimental results show that integrating the UniVTAC Encoder improves average success rates by 17.1% on the UniVTAC Benchmark, while real-world robotic experiments further demonstrate a 25% improvement in task success. Our webpage is available at https://univtac.github.io/.