:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Leo, Patel, Shivansh, Moon, Jay, Lazebnik, Svetlana, Jain, Unnat
Format:	Preprint
Published:	2026
Subjects:	Robotics Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.12120
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
by: Patel, Shivansh, et al.
Published: (2025)

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
by: Patel, Shivansh, et al.
Published: (2025)

ViPRA: Video Prediction for Robot Actions
by: Routray, Sandeep, et al.
Published: (2025)

CRAFT: Video Diffusion for Bimanual Robot Data Generation
by: Chen, Jason, et al.
Published: (2026)

Hand-Object Interaction Pretraining from Videos
by: Singh, Himanshu Gaurav, et al.
Published: (2024)

Bimanual Grasp Synthesis for Dexterous Robot Hands
by: Shao, Yanming, et al.
Published: (2024)

Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification
by: Wen, Jiawen, et al.
Published: (2026)

ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics
by: Wei, Ziyu, et al.
Published: (2026)

MOPA: Modular Object Navigation with PointGoal Agents
by: Raychaudhuri, Sonia, et al.
Published: (2023)

Twisting Lids Off with Two Hands
by: Lin, Toru, et al.
Published: (2024)

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction
by: Song, Yuxin Ray, et al.
Published: (2025)

RealDex: Towards Human-like Grasping for Robotic Dexterous Hand
by: Liu, Yumeng, et al.
Published: (2024)

Learning Visuotactile Skills with Two Multifingered Hands
by: Lin, Toru, et al.
Published: (2024)

World Models for Learning Dexterous Hand-Object Interactions from Human Videos
by: Goswami, Raktim Gautam, et al.
Published: (2025)

PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions
by: Lee, Jihyun, et al.
Published: (2026)

Conditioning Latent-Space Clusters for Real-World Anomaly Classification
by: Bogdoll, Daniel, et al.
Published: (2023)

Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
by: He, Yufei, et al.
Published: (2025)

HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
by: Valassakis, Eugene, et al.
Published: (2024)

Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition
by: Michieli, Umberto, et al.
Published: (2024)

ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation
by: Jain, Vidhi, et al.
Published: (2024)

SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation
by: zhi, Wang, et al.
Published: (2025)

FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation
by: Zeng, Huajian, et al.
Published: (2026)

HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
by: Banerjee, Prithviraj, et al.
Published: (2024)

World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
by: Lin, Zuyao, et al.
Published: (2026)

Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
by: Chen, Mingfei, et al.
Published: (2025)

ViTaS: Visual Tactile Soft Fusion Contrastive Learning for Visuomotor Learning
by: Tian, Yufeng, et al.
Published: (2026)

VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026)

Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search
by: Paramonov, Kirill, et al.
Published: (2024)

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
by: Zheng, Jinliang, et al.
Published: (2025)

Autonomous Robot for Disaster Mapping and Victim Localization
by: Potter, Michael, et al.
Published: (2024)

What Matters to Enhance Traffic Rule Compliance of Imitation Learning for End-to-End Autonomous Driving
by: Zhou, Hongkuan, et al.
Published: (2023)

3D Hand Pose Estimation in Everyday Egocentric Images
by: Prakash, Aditya, et al.
Published: (2023)

Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering
by: Frahm, Noah, et al.
Published: (2025)

A Vision-Enabled Prosthetic Hand for Children with Upper Limb Disabilities
by: Sarker, Md Abdul Baset, et al.
Published: (2025)

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping
by: Liang, William, et al.
Published: (2026)

Environment-Driven Online LiDAR-Camera Extrinsic Calibration
by: Huang, Zhiwei, et al.
Published: (2025)

Runtime Safety Monitoring of Deep Neural Networks for Perception: A Survey
by: Schotschneider, Albert, et al.
Published: (2025)

A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
by: Zhang, Yi, et al.
Published: (2025)

Unifying 2D and 3D Vision-Language Understanding
by: Jain, Ayush, et al.
Published: (2025)

Self-driving cars: Are we there yet?
by: Atasever, Merve, et al.
Published: (2025)