Saved in:
| Main Authors: | Liu, Qi, Li, Yabei, Wang, Hongsong, He, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.10935 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
by: Chang, Fuhao, et al.
Published: (2025)
by: Chang, Fuhao, et al.
Published: (2025)
OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
by: Cai, Junhao, et al.
Published: (2024)
by: Cai, Junhao, et al.
Published: (2024)
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
by: Chu, Hengshuo, et al.
Published: (2025)
by: Chu, Hengshuo, et al.
Published: (2025)
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024)
by: Zhang, Guowen, et al.
Published: (2024)
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
by: Wang, Shihao, et al.
Published: (2026)
by: Wang, Shihao, et al.
Published: (2026)
PointVLA: Injecting the 3D World into Vision-Language-Action Models
by: Li, Chengmeng, et al.
Published: (2025)
by: Li, Chengmeng, et al.
Published: (2025)
Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian
by: Chahe, Amirhosein, et al.
Published: (2024)
by: Chahe, Amirhosein, et al.
Published: (2024)
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
by: Kong, Lingdong, et al.
Published: (2024)
by: Kong, Lingdong, et al.
Published: (2024)
Learning 3D Persistent Embodied World Models
by: Zhou, Siyuan, et al.
Published: (2025)
by: Zhou, Siyuan, et al.
Published: (2025)
OpenSGA: Efficient 3D Scene Graph Alignment in the Open World
by: Chen, Gang, et al.
Published: (2026)
by: Chen, Gang, et al.
Published: (2026)
3D and 4D World Modeling: A Survey
by: Kong, Lingdong, et al.
Published: (2025)
by: Kong, Lingdong, et al.
Published: (2025)
3D-CDRGP: Towards Cross-Device Robotic Grasping Policy in 3D Open World
by: Zhao, Weiguang, et al.
Published: (2024)
by: Zhao, Weiguang, et al.
Published: (2024)
3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation
by: Yang, Shitian, et al.
Published: (2025)
by: Yang, Shitian, et al.
Published: (2025)
TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos
by: Lee, Seungjae, et al.
Published: (2025)
by: Lee, Seungjae, et al.
Published: (2025)
4D Contrastive Superflows are Dense 3D Representation Learners
by: Xu, Xiang, et al.
Published: (2024)
by: Xu, Xiang, et al.
Published: (2024)
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models
by: Lu, Ziqi, et al.
Published: (2024)
by: Lu, Ziqi, et al.
Published: (2024)
SceneFoundry: Generating Interactive Infinite 3D Worlds
by: Chen, ChunTeng, et al.
Published: (2026)
by: Chen, ChunTeng, et al.
Published: (2026)
Online Signature Verification based on the Lagrange formulation with 2D and 3D robotic models
by: Diaz, Moises, et al.
Published: (2025)
by: Diaz, Moises, et al.
Published: (2025)
BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2025)
by: Zhang, Guowen, et al.
Published: (2025)
Rethink 3D Object Detection from Physical World
by: Tanaka, Satoshi, et al.
Published: (2025)
by: Tanaka, Satoshi, et al.
Published: (2025)
Generalizable Humanoid Manipulation with 3D Diffusion Policies
by: Ze, Yanjie, et al.
Published: (2024)
by: Ze, Yanjie, et al.
Published: (2024)
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
by: Zhi, Hongyan, et al.
Published: (2025)
by: Zhi, Hongyan, et al.
Published: (2025)
Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection
by: Khurana, Mehar, et al.
Published: (2024)
by: Khurana, Mehar, et al.
Published: (2024)
Unsupervised Change Detection for Space Habitats Using 3D Point Clouds
by: Santos, Jamie, et al.
Published: (2023)
by: Santos, Jamie, et al.
Published: (2023)
TimePillars: Temporally-Recurrent 3D LiDAR Object Detection
by: Calvo, Ernesto Lozano, et al.
Published: (2023)
by: Calvo, Ernesto Lozano, et al.
Published: (2023)
Large Pre-Trained Models for Bimanual Manipulation in 3D
by: Yurchyk, Hanna, et al.
Published: (2025)
by: Yurchyk, Hanna, et al.
Published: (2025)
Opening the Black Box of 3D Reconstruction Error Analysis with VECTOR
by: Fygenson, Racquel, et al.
Published: (2024)
by: Fygenson, Racquel, et al.
Published: (2024)
D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement
by: Wang, Yixuan, et al.
Published: (2023)
by: Wang, Yixuan, et al.
Published: (2023)
The 2nd Place Solution from the 3D Semantic Segmentation Track in the 2024 Waymo Open Dataset Challenge
by: Wu, Qing
Published: (2025)
by: Wu, Qing
Published: (2025)
OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
by: Kassab, Christina, et al.
Published: (2025)
by: Kassab, Christina, et al.
Published: (2025)
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
by: Ze, Yanjie, et al.
Published: (2024)
by: Ze, Yanjie, et al.
Published: (2024)
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
by: Wu, Yanmin, et al.
Published: (2024)
by: Wu, Yanmin, et al.
Published: (2024)
Vision-based Manipulation from Single Human Video with Open-World Object Graphs
by: Zhu, Yifeng, et al.
Published: (2024)
by: Zhu, Yifeng, et al.
Published: (2024)
VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM
by: Pak, Gyuhyeon, et al.
Published: (2025)
by: Pak, Gyuhyeon, et al.
Published: (2025)
R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation
by: Ljungbergh, William, et al.
Published: (2025)
by: Ljungbergh, William, et al.
Published: (2025)
Articulated 3D Scene Graphs for Open-World Mobile Manipulation
by: Büchner, Martin, et al.
Published: (2026)
by: Büchner, Martin, et al.
Published: (2026)
Systematic Evaluation of Depth Backbones and Semantic Cues for Monocular Pseudo-LiDAR 3D Detection
by: Ajadalu, Samson Oseiwe
Published: (2026)
by: Ajadalu, Samson Oseiwe
Published: (2026)
MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving
by: Liu, Hongsi, et al.
Published: (2024)
by: Liu, Hongsi, et al.
Published: (2024)
SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction
by: Chen, Suzeyu, et al.
Published: (2026)
by: Chen, Suzeyu, et al.
Published: (2026)
MEDL-U: Uncertainty-aware 3D Automatic Annotation based on Evidential Deep Learning
by: Paat, Helbert, et al.
Published: (2023)
by: Paat, Helbert, et al.
Published: (2023)
Similar Items
-
VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
by: Chang, Fuhao, et al.
Published: (2025) -
OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
by: Cai, Junhao, et al.
Published: (2024) -
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
by: Chu, Hengshuo, et al.
Published: (2025) -
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024) -
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
by: Wang, Shihao, et al.
Published: (2026)