:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ansari, Junaid Ahmed, Ding, Ran, Pizzati, Fabio, Laptev, Ivan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Robotics
Online Access:	https://arxiv.org/abs/2603.16868
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
by: Zhang, Yangsong, et al.
Published: (2026)

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
by: Han, Mingfei, et al.
Published: (2024)

Learning human-to-robot handovers through 3D scene reconstruction
by: Wu, Yuekun, et al.
Published: (2025)

Learning to Generate Rigid Body Interactions with Video Diffusion Models
by: Romero, David, et al.
Published: (2025)

MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)

DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations
by: Son, Dongwon, et al.
Published: (2024)

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly
by: Ma, Liang, et al.
Published: (2025)

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
by: Yuan, Jianhao, et al.
Published: (2025)

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation
by: Khalifi, Omar El, et al.
Published: (2026)

GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation
by: Bilić, Ivan, et al.
Published: (2024)

Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet
by: Saeid, Mohammad, et al.
Published: (2025)

ContactHandover: Contact-Guided Robot-to-Human Object Handover
by: Wang, Zixi, et al.
Published: (2024)

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
by: Yu, Jiawen, et al.
Published: (2025)

Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision
by: Loza, Gerardo, et al.
Published: (2025)

FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
by: Rotondi, Dennis, et al.
Published: (2025)

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
by: Abdelreheem, Ahmed, et al.
Published: (2025)

UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
by: Xu, Zhengtong, et al.
Published: (2026)

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
by: Chen, Zerui, et al.
Published: (2024)

One-Shot Manipulation Strategy Learning by Making Contact Analogies
by: Liu, Yuyao, et al.
Published: (2024)

ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
by: Wang, Meizhong, et al.
Published: (2026)

RDD4D: 4D Attention-Guided Road Damage Detection And Classification
by: Alkalbani, Asma, et al.
Published: (2025)

Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors
by: Liao, Ziwei, et al.
Published: (2024)

Point-LN: A Lightweight Framework for Efficient Point Cloud Classification Using Non-Parametric Positional Encoding
by: Mohammadi, Marzieh, et al.
Published: (2025)

Towards Realistic Scene Generation with LiDAR Diffusion Models
by: Ran, Haoxi, et al.
Published: (2024)

How Good are Foundation Models in Step-by-Step Embodied Reasoning?
by: Dissanayake, Dinura, et al.
Published: (2025)

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos
by: Han, Mingfei, et al.
Published: (2026)

Joint stereo 3D object detection and implicit surface reconstruction
by: Li, Shichao, et al.
Published: (2021)

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar
by: Ding, Fangqiang, et al.
Published: (2024)

V3D-SLAM: Robust RGB-D SLAM in Dynamic Environments with 3D Semantic Geometry Voting
by: Dang, Tuan, et al.
Published: (2024)

GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion
by: Koledić, Karlo, et al.
Published: (2024)

g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
by: Wang, Zihan, et al.
Published: (2024)

FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
by: Eisner, Ben, et al.
Published: (2022)

Unifying 2D and 3D Vision-Language Understanding
by: Jain, Ayush, et al.
Published: (2025)

High-fidelity 3D reconstruction for planetary exploration
by: Martínez-Petersen, Alfonso, et al.
Published: (2026)

HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
by: Banerjee, Prithviraj, et al.
Published: (2024)

Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting
by: Büttner, Michael, et al.
Published: (2024)

ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory
by: Li, Ying, et al.
Published: (2025)

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025)

From Prompts to Pavement: LMMs-based Agentic Behavior-Tree Generation Framework for Autonomous Vehicles
by: Goba, Omar Y., et al.
Published: (2026)