:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Wenxuan, Xu, Xiuwei, Wang, Ziwei, Feng, Jianjiang, Zhou, Jie, Lu, Jiwen
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2502.10392
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
by: Guo, Wenxuan, et al.
Published: (2025)

3D Small Object Detection with Dynamic Spatial Pruning
by: Xu, Xiuwei, et al.
Published: (2023)

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models
by: Liang, Yinan, et al.
Published: (2025)

EmbodiedSAM: Online Segment Any 3D Thing in Real Time
by: Xu, Xiuwei, et al.
Published: (2024)

Memory-based Adapters for Online 3D Scene Perception
by: Xu, Xiuwei, et al.
Published: (2024)

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
by: Yin, Hang, et al.
Published: (2024)

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation
by: Guo, Wenxuan, et al.
Published: (2026)

GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
by: Yin, Hang, et al.
Published: (2025)

Anyview: Generalizable Indoor 3D Object Detection with Variable Frames
by: Wu, Zhenyu, et al.
Published: (2023)

3D Vascular Segmentation Supervised by 2D Annotation of Maximum Intensity Projection
by: Guo, Zhanqiang, et al.
Published: (2024)

Pose-Specific 3D Fingerprint Unfolding
by: Guan, Xiongjun, et al.
Published: (2024)

Towards Accurate Post-training Quantization for Diffusion Models
by: Wang, Changyuan, et al.
Published: (2023)

Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-timestamp 3D Human Pose Estimation
by: Pan, Zhiyu, et al.
Published: (2023)

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
by: Yin, Hang, et al.
Published: (2025)

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
by: Xu, Xiuwei, et al.
Published: (2025)

Cross-Modal Registration Between 3D and 2D Fingerprints via Pose-Aware Unwrapping and Point-Cloud Fusion
by: Guan, Xiongjun, et al.
Published: (2026)

iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion
by: Wang, Hao, et al.
Published: (2025)

LiDAR-HMR: 3D Human Mesh Recovery from LiDAR
by: Fan, Bohao, et al.
Published: (2023)

Phase-aggregated Dual-branch Network for Efficient Fingerprint Dense Registration
by: Guan, Xiongjun, et al.
Published: (2024)

Cross-Domain Vessel Segmentation via Latent Similarity Mining and Iterative Co-Optimization
by: Guo, Zhanqiang, et al.
Published: (2026)

HumanReg: Self-supervised Non-rigid Registration of Human Point Cloud
by: Chen, Yifan, et al.
Published: (2023)

Camera-LiDAR Cross-modality Gait Recognition
by: Guo, Wenxuan, et al.
Published: (2024)

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors
by: Guo, Wenxuan, et al.
Published: (2024)

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
by: Wang, Ziyi, et al.
Published: (2025)

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
by: Wang, Ziyi, et al.
Published: (2024)

Streaming 4D Visual Geometry Transformer
by: Zhuo, Dong, et al.
Published: (2025)

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
by: Wu, Yuqi, et al.
Published: (2025)

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
by: Zhang, Yanran, et al.
Published: (2025)

A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
by: Liu, Daizong, et al.
Published: (2024)

Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos
by: Dou, Weijia, et al.
Published: (2026)

VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking
by: Lu, Yuxuan, et al.
Published: (2024)

SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail Voxels
by: Yu, Qiucheng, et al.
Published: (2025)

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
by: Sun, Shuofeng, et al.
Published: (2024)

Fixed-length Dense Descriptor for Efficient Fingerprint Matching
by: Pan, Zhiyu, et al.
Published: (2023)

MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection
by: Li, Jianan, et al.
Published: (2024)

PhysX-3D: Physical-Grounded 3D Asset Generation
by: Cao, Ziang, et al.
Published: (2025)

PointVDP: Learning View-Dependent Projection by Fireworks Rays for 3D Point Cloud Segmentation
by: Chen, Yang, et al.
Published: (2025)

SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection
by: Son, Hyeongseok, et al.
Published: (2025)

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
by: Huang, Yuanhui, et al.
Published: (2024)