Saved in:
| Main Authors: | Ansari, Junaid Ahmed, Ding, Ran, Pizzati, Fabio, Laptev, Ivan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.16868 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
by: Zhang, Yangsong, et al.
Published: (2026)
by: Zhang, Yangsong, et al.
Published: (2026)
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
by: Han, Mingfei, et al.
Published: (2024)
by: Han, Mingfei, et al.
Published: (2024)
Learning human-to-robot handovers through 3D scene reconstruction
by: Wu, Yuekun, et al.
Published: (2025)
by: Wu, Yuekun, et al.
Published: (2025)
Learning to Generate Rigid Body Interactions with Video Diffusion Models
by: Romero, David, et al.
Published: (2025)
by: Romero, David, et al.
Published: (2025)
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)
by: Zhang, Pingrui, et al.
Published: (2025)
DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations
by: Son, Dongwon, et al.
Published: (2024)
by: Son, Dongwon, et al.
Published: (2024)
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly
by: Ma, Liang, et al.
Published: (2025)
by: Ma, Liang, et al.
Published: (2025)
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
by: Yuan, Jianhao, et al.
Published: (2025)
by: Yuan, Jianhao, et al.
Published: (2025)
ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation
by: Khalifi, Omar El, et al.
Published: (2026)
by: Khalifi, Omar El, et al.
Published: (2026)
GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation
by: Bilić, Ivan, et al.
Published: (2024)
by: Bilić, Ivan, et al.
Published: (2024)
Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet
by: Saeid, Mohammad, et al.
Published: (2025)
by: Saeid, Mohammad, et al.
Published: (2025)
ContactHandover: Contact-Guided Robot-to-Human Object Handover
by: Wang, Zixi, et al.
Published: (2024)
by: Wang, Zixi, et al.
Published: (2024)
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)
by: Liang, Wenqi, et al.
Published: (2025)
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
by: Yu, Jiawen, et al.
Published: (2025)
by: Yu, Jiawen, et al.
Published: (2025)
Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision
by: Loza, Gerardo, et al.
Published: (2025)
by: Loza, Gerardo, et al.
Published: (2025)
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
by: Rotondi, Dennis, et al.
Published: (2025)
by: Rotondi, Dennis, et al.
Published: (2025)
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
by: Abdelreheem, Ahmed, et al.
Published: (2025)
by: Abdelreheem, Ahmed, et al.
Published: (2025)
UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
by: Xu, Zhengtong, et al.
Published: (2026)
by: Xu, Zhengtong, et al.
Published: (2026)
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
by: Chen, Zerui, et al.
Published: (2024)
by: Chen, Zerui, et al.
Published: (2024)
One-Shot Manipulation Strategy Learning by Making Contact Analogies
by: Liu, Yuyao, et al.
Published: (2024)
by: Liu, Yuyao, et al.
Published: (2024)
ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
by: Wang, Meizhong, et al.
Published: (2026)
by: Wang, Meizhong, et al.
Published: (2026)
RDD4D: 4D Attention-Guided Road Damage Detection And Classification
by: Alkalbani, Asma, et al.
Published: (2025)
by: Alkalbani, Asma, et al.
Published: (2025)
Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors
by: Liao, Ziwei, et al.
Published: (2024)
by: Liao, Ziwei, et al.
Published: (2024)
Point-LN: A Lightweight Framework for Efficient Point Cloud Classification Using Non-Parametric Positional Encoding
by: Mohammadi, Marzieh, et al.
Published: (2025)
by: Mohammadi, Marzieh, et al.
Published: (2025)
Towards Realistic Scene Generation with LiDAR Diffusion Models
by: Ran, Haoxi, et al.
Published: (2024)
by: Ran, Haoxi, et al.
Published: (2024)
How Good are Foundation Models in Step-by-Step Embodied Reasoning?
by: Dissanayake, Dinura, et al.
Published: (2025)
by: Dissanayake, Dinura, et al.
Published: (2025)
Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos
by: Han, Mingfei, et al.
Published: (2026)
by: Han, Mingfei, et al.
Published: (2026)
Joint stereo 3D object detection and implicit surface reconstruction
by: Li, Shichao, et al.
Published: (2021)
by: Li, Shichao, et al.
Published: (2021)
RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar
by: Ding, Fangqiang, et al.
Published: (2024)
by: Ding, Fangqiang, et al.
Published: (2024)
V3D-SLAM: Robust RGB-D SLAM in Dynamic Environments with 3D Semantic Geometry Voting
by: Dang, Tuan, et al.
Published: (2024)
by: Dang, Tuan, et al.
Published: (2024)
GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion
by: Koledić, Karlo, et al.
Published: (2024)
by: Koledić, Karlo, et al.
Published: (2024)
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
by: Wang, Zihan, et al.
Published: (2024)
by: Wang, Zihan, et al.
Published: (2024)
FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
by: Eisner, Ben, et al.
Published: (2022)
by: Eisner, Ben, et al.
Published: (2022)
Unifying 2D and 3D Vision-Language Understanding
by: Jain, Ayush, et al.
Published: (2025)
by: Jain, Ayush, et al.
Published: (2025)
High-fidelity 3D reconstruction for planetary exploration
by: Martínez-Petersen, Alfonso, et al.
Published: (2026)
by: Martínez-Petersen, Alfonso, et al.
Published: (2026)
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
by: Banerjee, Prithviraj, et al.
Published: (2024)
by: Banerjee, Prithviraj, et al.
Published: (2024)
Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting
by: Büttner, Michael, et al.
Published: (2024)
by: Büttner, Michael, et al.
Published: (2024)
ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory
by: Li, Ying, et al.
Published: (2025)
by: Li, Ying, et al.
Published: (2025)
AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025)
by: Xu, Tianling, et al.
Published: (2025)
From Prompts to Pavement: LMMs-based Agentic Behavior-Tree Generation Framework for Autonomous Vehicles
by: Goba, Omar Y., et al.
Published: (2026)
by: Goba, Omar Y., et al.
Published: (2026)
Similar Items
-
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
by: Zhang, Yangsong, et al.
Published: (2026) -
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
by: Han, Mingfei, et al.
Published: (2024) -
Learning human-to-robot handovers through 3D scene reconstruction
by: Wu, Yuekun, et al.
Published: (2025) -
Learning to Generate Rigid Body Interactions with Video Diffusion Models
by: Romero, David, et al.
Published: (2025) -
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)