Saved in:
| Main Authors: | Yu, Hanxun, Li, Wentong, Wang, Song, Chen, Junbo, Zhu, Jianke |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.00513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
by: Yu, Hanxun, et al.
Published: (2026)
by: Yu, Hanxun, et al.
Published: (2026)
InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes
by: Liu, Hongyuan, et al.
Published: (2025)
by: Liu, Hongyuan, et al.
Published: (2025)
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
by: Wang, Song, et al.
Published: (2024)
by: Wang, Song, et al.
Published: (2024)
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
by: Li, Zeju, et al.
Published: (2024)
by: Li, Zeju, et al.
Published: (2024)
Label-efficient Semantic Scene Completion with Scribble Annotations
by: Wang, Song, et al.
Published: (2024)
by: Wang, Song, et al.
Published: (2024)
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
by: Shi, Zhan, et al.
Published: (2025)
by: Shi, Zhan, et al.
Published: (2025)
Osprey: Pixel Understanding with Visual Instruction Tuning
by: Yuan, Yuqian, et al.
Published: (2023)
by: Yuan, Yuqian, et al.
Published: (2023)
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
by: Liu, Xiaolu, et al.
Published: (2025)
by: Liu, Xiaolu, et al.
Published: (2025)
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
by: Liu, Xiaolu, et al.
Published: (2024)
by: Liu, Xiaolu, et al.
Published: (2024)
ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning
by: Wang, Song, et al.
Published: (2024)
by: Wang, Song, et al.
Published: (2024)
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
by: Yin, Junbo, et al.
Published: (2024)
by: Yin, Junbo, et al.
Published: (2024)
Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)
by: Zheng, Hongpei, et al.
Published: (2025)
Unlocking Dense Metric Depth Estimation in VLMs
by: Yu, Hanxun, et al.
Published: (2026)
by: Yu, Hanxun, et al.
Published: (2026)
Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
by: Gan, Qijun, et al.
Published: (2024)
by: Gan, Qijun, et al.
Published: (2024)
HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
by: Li, Zhuopeng, et al.
Published: (2024)
by: Li, Zhuopeng, et al.
Published: (2024)
InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
by: Kumar, Ashutosh, et al.
Published: (2026)
by: Kumar, Ashutosh, et al.
Published: (2026)
Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
by: Liu, Enyu, et al.
Published: (2025)
by: Liu, Enyu, et al.
Published: (2025)
Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding
by: Huang, Wencan, et al.
Published: (2025)
by: Huang, Wencan, et al.
Published: (2025)
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
by: Wang, Xiaoye, et al.
Published: (2025)
by: Wang, Xiaoye, et al.
Published: (2025)
HVOFusion: Incremental Mesh Reconstruction Using Hybrid Voxel Octree
by: Liu, Shaofan, et al.
Published: (2024)
by: Liu, Shaofan, et al.
Published: (2024)
Inst4DGS: Instance-Decomposed 4D Gaussian Splatting with Multi-Video Label Permutation Learning
by: Lee, Yonghan, et al.
Published: (2026)
by: Lee, Yonghan, et al.
Published: (2026)
SAI3D: Segment Any Instance in 3D Scenes
by: Yin, Yingda, et al.
Published: (2023)
by: Yin, Yingda, et al.
Published: (2023)
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
by: Yin, Yuehao, et al.
Published: (2023)
by: Yin, Yuehao, et al.
Published: (2023)
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans
by: Perauer, Cedric, et al.
Published: (2024)
by: Perauer, Cedric, et al.
Published: (2024)
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning
by: Lu, Yiting, et al.
Published: (2025)
by: Lu, Yiting, et al.
Published: (2025)
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
by: Huang, Zehuan, et al.
Published: (2024)
by: Huang, Zehuan, et al.
Published: (2024)
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
by: Rozenberszki, David, et al.
Published: (2023)
by: Rozenberszki, David, et al.
Published: (2023)
Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds
by: Yang, Bin, et al.
Published: (2026)
by: Yang, Bin, et al.
Published: (2026)
Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
by: Yang, Yu-Qi, et al.
Published: (2024)
by: Yang, Yu-Qi, et al.
Published: (2024)
UDA4Inst: Unsupervised Domain Adaptation for Instance Segmentation
by: Guo, Yachan, et al.
Published: (2024)
by: Guo, Yachan, et al.
Published: (2024)
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
by: Wang, Song, et al.
Published: (2025)
by: Wang, Song, et al.
Published: (2025)
MambaMap: Online Vectorized HD Map Construction using State Space Model
by: Yang, Ruizi, et al.
Published: (2025)
by: Yang, Ruizi, et al.
Published: (2025)
3D Question Answering for City Scene Understanding
by: Sun, Penglei, et al.
Published: (2024)
by: Sun, Penglei, et al.
Published: (2024)
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories
by: Zhang, Yisu, et al.
Published: (2026)
by: Zhang, Yisu, et al.
Published: (2026)
DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving
by: Liu, Xiaolu, et al.
Published: (2026)
by: Liu, Xiaolu, et al.
Published: (2026)
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
by: Liu, Xiaolu, et al.
Published: (2026)
by: Liu, Xiaolu, et al.
Published: (2026)
AG$^2$aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing
by: Wang, Zhaonan, et al.
Published: (2025)
by: Wang, Zhaonan, et al.
Published: (2025)
InstructSAM: Segment Any Instance with Any Instructions
by: Yuan, Yuqian, et al.
Published: (2026)
by: Yuan, Yuqian, et al.
Published: (2026)
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
by: Ji, Jiayi, et al.
Published: (2023)
by: Ji, Jiayi, et al.
Published: (2023)
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
by: Peng, Wujian, et al.
Published: (2024)
by: Peng, Wujian, et al.
Published: (2024)
Similar Items
-
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
by: Yu, Hanxun, et al.
Published: (2026) -
InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes
by: Liu, Hongyuan, et al.
Published: (2025) -
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
by: Wang, Song, et al.
Published: (2024) -
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
by: Li, Zeju, et al.
Published: (2024) -
Label-efficient Semantic Scene Completion with Scribble Annotations
by: Wang, Song, et al.
Published: (2024)