:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Qi, Li, Yabei, Wang, Hongsong, He, Lei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning Robotics
Online Access:	https://arxiv.org/abs/2508.10935
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
by: Chang, Fuhao, et al.
Published: (2025)

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
by: Cai, Junhao, et al.
Published: (2024)

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
by: Chu, Hengshuo, et al.
Published: (2025)

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024)

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
by: Wang, Shihao, et al.
Published: (2026)

PointVLA: Injecting the 3D World into Vision-Language-Action Models
by: Li, Chengmeng, et al.
Published: (2025)

Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian
by: Chahe, Amirhosein, et al.
Published: (2024)

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
by: Kong, Lingdong, et al.
Published: (2024)

Learning 3D Persistent Embodied World Models
by: Zhou, Siyuan, et al.
Published: (2025)

OpenSGA: Efficient 3D Scene Graph Alignment in the Open World
by: Chen, Gang, et al.
Published: (2026)

3D and 4D World Modeling: A Survey
by: Kong, Lingdong, et al.
Published: (2025)

3D-CDRGP: Towards Cross-Device Robotic Grasping Policy in 3D Open World
by: Zhao, Weiguang, et al.
Published: (2024)

3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation
by: Yang, Shitian, et al.
Published: (2025)

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos
by: Lee, Seungjae, et al.
Published: (2025)

4D Contrastive Superflows are Dense 3D Representation Learners
by: Xu, Xiang, et al.
Published: (2024)

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models
by: Lu, Ziqi, et al.
Published: (2024)

SceneFoundry: Generating Interactive Infinite 3D Worlds
by: Chen, ChunTeng, et al.
Published: (2026)

Online Signature Verification based on the Lagrange formulation with 2D and 3D robotic models
by: Diaz, Moises, et al.
Published: (2025)

BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2025)

Rethink 3D Object Detection from Physical World
by: Tanaka, Satoshi, et al.
Published: (2025)

Generalizable Humanoid Manipulation with 3D Diffusion Policies
by: Ze, Yanjie, et al.
Published: (2024)

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
by: Zhi, Hongyan, et al.
Published: (2025)

Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection
by: Khurana, Mehar, et al.
Published: (2024)

Unsupervised Change Detection for Space Habitats Using 3D Point Clouds
by: Santos, Jamie, et al.
Published: (2023)

TimePillars: Temporally-Recurrent 3D LiDAR Object Detection
by: Calvo, Ernesto Lozano, et al.
Published: (2023)

Large Pre-Trained Models for Bimanual Manipulation in 3D
by: Yurchyk, Hanna, et al.
Published: (2025)

Opening the Black Box of 3D Reconstruction Error Analysis with VECTOR
by: Fygenson, Racquel, et al.
Published: (2024)

D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement
by: Wang, Yixuan, et al.
Published: (2023)

The 2nd Place Solution from the 3D Semantic Segmentation Track in the 2024 Waymo Open Dataset Challenge
by: Wu, Qing
Published: (2025)

OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
by: Kassab, Christina, et al.
Published: (2025)

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
by: Ze, Yanjie, et al.
Published: (2024)

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
by: Wu, Yanmin, et al.
Published: (2024)

Vision-based Manipulation from Single Human Video with Open-World Object Graphs
by: Zhu, Yifeng, et al.
Published: (2024)

VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM
by: Pak, Gyuhyeon, et al.
Published: (2025)

R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation
by: Ljungbergh, William, et al.
Published: (2025)

Articulated 3D Scene Graphs for Open-World Mobile Manipulation
by: Büchner, Martin, et al.
Published: (2026)

Systematic Evaluation of Depth Backbones and Semantic Cues for Monocular Pseudo-LiDAR 3D Detection
by: Ajadalu, Samson Oseiwe
Published: (2026)

MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving
by: Liu, Hongsi, et al.
Published: (2024)

SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction
by: Chen, Suzeyu, et al.
Published: (2026)

MEDL-U: Uncertainty-aware 3D Automatic Annotation based on Evidential Deep Learning
by: Paat, Helbert, et al.
Published: (2023)