MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Zong, Junyi, Jia, Qingxuan, Shi, Meixian, Li, Tong, Li, Jiayuan, Lv, Zihang, Chen, Gang, Deng, Fang
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence Robotics
Accesso online:	https://arxiv.org/abs/2604.03322
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913004116770816
author	Zong, Junyi Jia, Qingxuan Shi, Meixian Li, Tong Li, Jiayuan Lv, Zihang Chen, Gang Deng, Fang
author_facet	Zong, Junyi Jia, Qingxuan Shi, Meixian Li, Tong Li, Jiayuan Lv, Zihang Chen, Gang Deng, Fang
contents	Quality inspection in smart manufacturing requires identifying intrinsic material and surface properties beyond visible geometry, yet vision-only methods remain vulnerable to occlusion and reflection. We propose VitaTouch, a property-aware vision-tactile-language model for material-property inference and natural-language attribute description. VitaTouch uses modality-specific encoders and a dual Q-Former to extract language-relevant visual and tactile features, which are compressed into prefix tokens for a large language model. We align each modality with text and explicitly couple vision and touch through contrastive learning. We also construct VitaSet, a multimodal dataset with 186 objects, 52k images, and 5.1k human-verified instruction-answer pairs. VitaTouch achieves the best performance on HCT and the overall TVL benchmark, while remaining competitive on SSVTP. On VitaSet, it reaches 88.89% hardness accuracy, 75.13% roughness accuracy, and 54.81% descriptor recall; the material-description task further achieves a peak semantic similarity of 0.9009. With LoRA-based fine-tuning, VitaTouch attains 100.0%, 96.0%, and 92.0% accuracy for 2-, 3-, and 5-category defect recognition, respectively, and delivers 94.0% closed-loop recognition accuracy and 94.0% end-to-end sorting success in 100 laboratory robotic trials. More details are available at the project page: https://vitatouch.github.io/
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_03322
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing Zong, Junyi Jia, Qingxuan Shi, Meixian Li, Tong Li, Jiayuan Lv, Zihang Chen, Gang Deng, Fang Computer Vision and Pattern Recognition Artificial Intelligence Robotics Quality inspection in smart manufacturing requires identifying intrinsic material and surface properties beyond visible geometry, yet vision-only methods remain vulnerable to occlusion and reflection. We propose VitaTouch, a property-aware vision-tactile-language model for material-property inference and natural-language attribute description. VitaTouch uses modality-specific encoders and a dual Q-Former to extract language-relevant visual and tactile features, which are compressed into prefix tokens for a large language model. We align each modality with text and explicitly couple vision and touch through contrastive learning. We also construct VitaSet, a multimodal dataset with 186 objects, 52k images, and 5.1k human-verified instruction-answer pairs. VitaTouch achieves the best performance on HCT and the overall TVL benchmark, while remaining competitive on SSVTP. On VitaSet, it reaches 88.89% hardness accuracy, 75.13% roughness accuracy, and 54.81% descriptor recall; the material-description task further achieves a peak semantic similarity of 0.9009. With LoRA-based fine-tuning, VitaTouch attains 100.0%, 96.0%, and 92.0% accuracy for 2-, 3-, and 5-category defect recognition, respectively, and delivers 94.0% closed-loop recognition accuracy and 94.0% end-to-end sorting success in 100 laboratory robotic trials. More details are available at the project page: https://vitatouch.github.io/
title	VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing
topic	Computer Vision and Pattern Recognition Artificial Intelligence Robotics
url	https://arxiv.org/abs/2604.03322

Documenti analoghi