:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cheng, Ning, Li, You, Gao, Jing, Fang, Bin, Xu, Jinan, Han, Wenjuan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2403.09813
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation
by: Cheng, Ning, et al.
Published: (2024)

SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
by: Cheng, Ning, et al.
Published: (2025)

A Touch, Vision, and Language Dataset for Multimodal Alignment
by: Fu, Letian, et al.
Published: (2024)

SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics
by: Liu, Mengzhen, et al.
Published: (2026)

Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025)

AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
by: Feng, Ruoxuan, et al.
Published: (2026)

VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing
by: Zong, Junyi, et al.
Published: (2026)

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views
by: You, Junwei, et al.
Published: (2026)

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
by: Wen, Junjie, et al.
Published: (2024)

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation
by: Zuo, Jing, et al.
Published: (2026)

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
by: Yang, Fengyu, et al.
Published: (2024)

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
by: Feng, Ruoxuan, et al.
Published: (2025)

ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model
by: Zhou, Zhongyi, et al.
Published: (2025)

ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation
by: Xue, Wei, et al.
Published: (2026)

Learning Gentle Grasping Using Vision, Sound, and Touch
by: Nakahara, Ken, et al.
Published: (2025)

Tacchi 2.0: A Low Computational Cost and Comprehensive Dynamic Contact Simulator for Vision-based Tactile Sensors
by: Sun, Yuhao, et al.
Published: (2025)

Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos
by: Xu, Haoxuan, et al.
Published: (2026)

Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation
by: Ding, Hongyu, et al.
Published: (2026)

TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
by: Gu, Langzhe, et al.
Published: (2026)

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting
by: Strong, Matthew, et al.
Published: (2024)

DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models
by: Li, Chenyang, et al.
Published: (2026)

Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception
by: Yang, Jiashu, et al.
Published: (2025)

Cross-Sensor Touch Generation
by: Rodriguez, Samanta, et al.
Published: (2025)

CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments
by: Zhou, Yang, et al.
Published: (2024)

Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms
by: Cao, Zhixiang, et al.
Published: (2026)

MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
by: Yin, Zhenhan, et al.
Published: (2025)

HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
by: Shi, Zhonghao, et al.
Published: (2025)

DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation
by: Yu, Yinfeng, et al.
Published: (2025)

UAOR: Uncertainty-aware Observation Reinjection for Vision-Language-Action Models
by: Yang, Jiabing, et al.
Published: (2026)

dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
by: Wen, Junjie, et al.
Published: (2025)

Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain
by: Luo, Yulin, et al.
Published: (2025)

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
by: Han, Wencheng, et al.
Published: (2024)

ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics
by: Wei, Ziyu, et al.
Published: (2026)

SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
by: Liu, Haowen, et al.
Published: (2025)

When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs
by: Fang, Yu, et al.
Published: (2026)

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
by: Hui, Chenyu, et al.
Published: (2026)

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
by: Zhang, Wenyao, et al.
Published: (2025)

RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception
by: Ma, Jiahao, et al.
Published: (2026)

Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
by: Li, Yuyang, et al.
Published: (2025)

Masked Depth Modeling for Spatial Perception
by: Tan, Bin, et al.
Published: (2026)