:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Hanxun, Li, Wentong, Wang, Song, Chen, Junbo, Zhu, Jianke
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.00513
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
by: Yu, Hanxun, et al.
Published: (2026)

InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes
by: Liu, Hongyuan, et al.
Published: (2025)

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
by: Wang, Song, et al.
Published: (2024)

3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
by: Li, Zeju, et al.
Published: (2024)

Label-efficient Semantic Scene Completion with Scribble Annotations
by: Wang, Song, et al.
Published: (2024)

A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
by: Shi, Zhan, et al.
Published: (2025)

Osprey: Pixel Understanding with Visual Instruction Tuning
by: Yuan, Yuqian, et al.
Published: (2023)

Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
by: Liu, Xiaolu, et al.
Published: (2025)

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
by: Liu, Xiaolu, et al.
Published: (2024)

ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning
by: Wang, Song, et al.
Published: (2024)

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
by: Yin, Junbo, et al.
Published: (2024)

Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)

Unlocking Dense Metric Depth Estimation in VLMs
by: Yu, Hanxun, et al.
Published: (2026)

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
by: Gan, Qijun, et al.
Published: (2024)

HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
by: Li, Zhuopeng, et al.
Published: (2024)

InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
by: Kumar, Ashutosh, et al.
Published: (2026)

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
by: Liu, Enyu, et al.
Published: (2025)

Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding
by: Huang, Wencan, et al.
Published: (2025)

3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
by: Wang, Xiaoye, et al.
Published: (2025)

HVOFusion: Incremental Mesh Reconstruction Using Hybrid Voxel Octree
by: Liu, Shaofan, et al.
Published: (2024)

Inst4DGS: Instance-Decomposed 4D Gaussian Splatting with Multi-Video Label Permutation Learning
by: Lee, Yonghan, et al.
Published: (2026)

SAI3D: Segment Any Instance in 3D Scenes
by: Yin, Yingda, et al.
Published: (2023)

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
by: Yin, Yuehao, et al.
Published: (2023)

AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans
by: Perauer, Cedric, et al.
Published: (2024)

Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning
by: Lu, Yiting, et al.
Published: (2025)

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
by: Huang, Zehuan, et al.
Published: (2024)

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
by: Rozenberszki, David, et al.
Published: (2023)

Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds
by: Yang, Bin, et al.
Published: (2026)

Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
by: Yang, Yu-Qi, et al.
Published: (2024)

UDA4Inst: Unsupervised Domain Adaptation for Instance Segmentation
by: Guo, Yachan, et al.
Published: (2024)

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
by: Wang, Song, et al.
Published: (2025)

MambaMap: Online Vectorized HD Map Construction using State Space Model
by: Yang, Ruizi, et al.
Published: (2025)

3D Question Answering for City Scene Understanding
by: Sun, Penglei, et al.
Published: (2024)

WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories
by: Zhang, Yisu, et al.
Published: (2026)

DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving
by: Liu, Xiaolu, et al.
Published: (2026)

Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
by: Liu, Xiaolu, et al.
Published: (2026)

AG$^2$aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing
by: Wang, Zhaonan, et al.
Published: (2025)

InstructSAM: Segment Any Instance with Any Instructions
by: Yuan, Yuqian, et al.
Published: (2026)

JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
by: Ji, Jiayi, et al.
Published: (2023)

INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
by: Peng, Wujian, et al.
Published: (2024)