Saved in:
| Main Authors: | Cheng, Ning, Li, You, Gao, Jing, Fang, Bin, Xu, Jinan, Han, Wenjuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.09813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation
by: Cheng, Ning, et al.
Published: (2024)
by: Cheng, Ning, et al.
Published: (2024)
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
by: Cheng, Ning, et al.
Published: (2025)
by: Cheng, Ning, et al.
Published: (2025)
A Touch, Vision, and Language Dataset for Multimodal Alignment
by: Fu, Letian, et al.
Published: (2024)
by: Fu, Letian, et al.
Published: (2024)
SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics
by: Liu, Mengzhen, et al.
Published: (2026)
by: Liu, Mengzhen, et al.
Published: (2026)
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025)
by: Han, Xiaofeng, et al.
Published: (2025)
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
by: Feng, Ruoxuan, et al.
Published: (2026)
by: Feng, Ruoxuan, et al.
Published: (2026)
VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing
by: Zong, Junyi, et al.
Published: (2026)
by: Zong, Junyi, et al.
Published: (2026)
V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views
by: You, Junwei, et al.
Published: (2026)
by: You, Junwei, et al.
Published: (2026)
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
by: Wen, Junjie, et al.
Published: (2024)
by: Wen, Junjie, et al.
Published: (2024)
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation
by: Zuo, Jing, et al.
Published: (2026)
by: Zuo, Jing, et al.
Published: (2026)
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
by: Yang, Fengyu, et al.
Published: (2024)
by: Yang, Fengyu, et al.
Published: (2024)
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
by: Feng, Ruoxuan, et al.
Published: (2025)
by: Feng, Ruoxuan, et al.
Published: (2025)
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model
by: Zhou, Zhongyi, et al.
Published: (2025)
by: Zhou, Zhongyi, et al.
Published: (2025)
ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation
by: Xue, Wei, et al.
Published: (2026)
by: Xue, Wei, et al.
Published: (2026)
Learning Gentle Grasping Using Vision, Sound, and Touch
by: Nakahara, Ken, et al.
Published: (2025)
by: Nakahara, Ken, et al.
Published: (2025)
Tacchi 2.0: A Low Computational Cost and Comprehensive Dynamic Contact Simulator for Vision-based Tactile Sensors
by: Sun, Yuhao, et al.
Published: (2025)
by: Sun, Yuhao, et al.
Published: (2025)
Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos
by: Xu, Haoxuan, et al.
Published: (2026)
by: Xu, Haoxuan, et al.
Published: (2026)
Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation
by: Ding, Hongyu, et al.
Published: (2026)
by: Ding, Hongyu, et al.
Published: (2026)
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
by: Gu, Langzhe, et al.
Published: (2026)
by: Gu, Langzhe, et al.
Published: (2026)
Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting
by: Strong, Matthew, et al.
Published: (2024)
by: Strong, Matthew, et al.
Published: (2024)
DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models
by: Li, Chenyang, et al.
Published: (2026)
by: Li, Chenyang, et al.
Published: (2026)
Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception
by: Yang, Jiashu, et al.
Published: (2025)
by: Yang, Jiashu, et al.
Published: (2025)
Cross-Sensor Touch Generation
by: Rodriguez, Samanta, et al.
Published: (2025)
by: Rodriguez, Samanta, et al.
Published: (2025)
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments
by: Zhou, Yang, et al.
Published: (2024)
by: Zhou, Yang, et al.
Published: (2024)
Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms
by: Cao, Zhixiang, et al.
Published: (2026)
by: Cao, Zhixiang, et al.
Published: (2026)
MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
by: Yin, Zhenhan, et al.
Published: (2025)
by: Yin, Zhenhan, et al.
Published: (2025)
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
by: Shi, Zhonghao, et al.
Published: (2025)
by: Shi, Zhonghao, et al.
Published: (2025)
DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation
by: Yu, Yinfeng, et al.
Published: (2025)
by: Yu, Yinfeng, et al.
Published: (2025)
UAOR: Uncertainty-aware Observation Reinjection for Vision-Language-Action Models
by: Yang, Jiabing, et al.
Published: (2026)
by: Yang, Jiabing, et al.
Published: (2026)
dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
by: Wen, Junjie, et al.
Published: (2025)
by: Wen, Junjie, et al.
Published: (2025)
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain
by: Luo, Yulin, et al.
Published: (2025)
by: Luo, Yulin, et al.
Published: (2025)
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
by: Han, Wencheng, et al.
Published: (2024)
by: Han, Wencheng, et al.
Published: (2024)
ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics
by: Wei, Ziyu, et al.
Published: (2026)
by: Wei, Ziyu, et al.
Published: (2026)
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
by: Liu, Haowen, et al.
Published: (2025)
by: Liu, Haowen, et al.
Published: (2025)
When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs
by: Fang, Yu, et al.
Published: (2026)
by: Fang, Yu, et al.
Published: (2026)
Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
by: Hui, Chenyu, et al.
Published: (2026)
by: Hui, Chenyu, et al.
Published: (2026)
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
by: Zhang, Wenyao, et al.
Published: (2025)
by: Zhang, Wenyao, et al.
Published: (2025)
RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception
by: Ma, Jiahao, et al.
Published: (2026)
by: Ma, Jiahao, et al.
Published: (2026)
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
by: Li, Yuyang, et al.
Published: (2025)
by: Li, Yuyang, et al.
Published: (2025)
Masked Depth Modeling for Spatial Perception
by: Tan, Bin, et al.
Published: (2026)
by: Tan, Bin, et al.
Published: (2026)
Similar Items
-
Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation
by: Cheng, Ning, et al.
Published: (2024) -
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
by: Cheng, Ning, et al.
Published: (2025) -
A Touch, Vision, and Language Dataset for Multimodal Alignment
by: Fu, Letian, et al.
Published: (2024) -
SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics
by: Liu, Mengzhen, et al.
Published: (2026) -
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025)