Saved in:
| Main Authors: | Liu, Hao, Ma, Yanni, Liu, Yan, Xiao, Haihong, He, Ying |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.18666 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting
by: Liu, Hao, et al.
Published: (2024)
by: Liu, Hao, et al.
Published: (2024)
Edge-Centric Relational Reasoning for 3D Scene Graph Prediction
by: Ma, Yanni, et al.
Published: (2025)
by: Ma, Yanni, et al.
Published: (2025)
O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model
by: Hu, Yubin, et al.
Published: (2023)
by: Hu, Yubin, et al.
Published: (2023)
Anatomical Structure-Guided Medical Vision-Language Pre-training
by: Li, Qingqiu, et al.
Published: (2024)
by: Li, Qingqiu, et al.
Published: (2024)
Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation
by: Zhang, Dingwen, et al.
Published: (2024)
by: Zhang, Dingwen, et al.
Published: (2024)
SplatCo: Structure-View Collaborative Gaussian Splatting for Detail-Preserving Rendering of Large-Scale Unbounded Scenes
by: Xiao, Haihong, et al.
Published: (2025)
by: Xiao, Haihong, et al.
Published: (2025)
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
by: Liu, Che, et al.
Published: (2023)
by: Liu, Che, et al.
Published: (2023)
Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation
by: Lin, Hongbin, et al.
Published: (2025)
by: Lin, Hongbin, et al.
Published: (2025)
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
by: Ma, Shuailei, et al.
Published: (2023)
by: Ma, Shuailei, et al.
Published: (2023)
Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting
by: Lu, Jiahui, et al.
Published: (2025)
by: Lu, Jiahui, et al.
Published: (2025)
View-on-Graph: Zero-shot 3D Visual Grounding via Vision-Language Reasoning on Scene Graphs
by: Liu, Yuanyuan, et al.
Published: (2025)
by: Liu, Yuanyuan, et al.
Published: (2025)
Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting
by: Liu, Keyi, et al.
Published: (2025)
by: Liu, Keyi, et al.
Published: (2025)
Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training
by: Li, Wenyu, et al.
Published: (2025)
by: Li, Wenyu, et al.
Published: (2025)
CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
by: Lin, Xiao, et al.
Published: (2024)
by: Lin, Xiao, et al.
Published: (2024)
HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping
by: Ma, Yu, et al.
Published: (2025)
by: Ma, Yu, et al.
Published: (2025)
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
by: Hua, Hang, et al.
Published: (2024)
by: Hua, Hang, et al.
Published: (2024)
LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding
by: Xiao, Feng, et al.
Published: (2025)
by: Xiao, Feng, et al.
Published: (2025)
UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving
by: Min, Chen, et al.
Published: (2023)
by: Min, Chen, et al.
Published: (2023)
LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection
by: Hao, Lei, et al.
Published: (2025)
by: Hao, Lei, et al.
Published: (2025)
MG-3D: Multi-Grained Knowledge-Enhanced 3D Medical Vision-Language Pre-training
by: Ni, Xuefeng, et al.
Published: (2024)
by: Ni, Xuefeng, et al.
Published: (2024)
EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes
by: Guo, Jianlin, et al.
Published: (2025)
by: Guo, Jianlin, et al.
Published: (2025)
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
by: Wang, Haicheng, et al.
Published: (2024)
by: Wang, Haicheng, et al.
Published: (2024)
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
by: Liu, Haowei, et al.
Published: (2024)
by: Liu, Haowei, et al.
Published: (2024)
Event Camera Data Dense Pre-training
by: Yang, Yan, et al.
Published: (2023)
by: Yang, Yan, et al.
Published: (2023)
Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting
by: Cheng, Chong, et al.
Published: (2025)
by: Cheng, Chong, et al.
Published: (2025)
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
by: Zhao, Shuai, et al.
Published: (2023)
by: Zhao, Shuai, et al.
Published: (2023)
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
by: Ruan, Shouwei, et al.
Published: (2024)
by: Ruan, Shouwei, et al.
Published: (2024)
Scaling Pre-training to One Hundred Billion Data for Vision Language Models
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment
by: Liu, Kangcheng, et al.
Published: (2023)
by: Liu, Kangcheng, et al.
Published: (2023)
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
by: Yang, Ying, et al.
Published: (2025)
by: Yang, Ying, et al.
Published: (2025)
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
by: Wang, Ling, et al.
Published: (2026)
by: Wang, Ling, et al.
Published: (2026)
Toward Scene Graph and Layout Guided Complex 3D Scene Generation
by: Huang, Yu-Hsiang, et al.
Published: (2024)
by: Huang, Yu-Hsiang, et al.
Published: (2024)
Efficient Vision-Language Pre-training by Cluster Masking
by: Wei, Zihao, et al.
Published: (2024)
by: Wei, Zihao, et al.
Published: (2024)
Enhancing Vision-Language Pre-training with Rich Supervisions
by: Gao, Yuan, et al.
Published: (2024)
by: Gao, Yuan, et al.
Published: (2024)
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
by: Huang, Jiangyong, et al.
Published: (2025)
by: Huang, Jiangyong, et al.
Published: (2025)
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
by: Liu, Jihao, et al.
Published: (2024)
by: Liu, Jihao, et al.
Published: (2024)
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
by: Wang, Rongsheng, et al.
Published: (2023)
by: Wang, Rongsheng, et al.
Published: (2023)
LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
by: Hao, Haihong, et al.
Published: (2026)
by: Hao, Haihong, et al.
Published: (2026)
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
by: He, Xiaoxuan, et al.
Published: (2023)
by: He, Xiaoxuan, et al.
Published: (2023)
Similar Items
-
Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting
by: Liu, Hao, et al.
Published: (2024) -
Edge-Centric Relational Reasoning for 3D Scene Graph Prediction
by: Ma, Yanni, et al.
Published: (2025) -
O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model
by: Hu, Yubin, et al.
Published: (2023) -
Anatomical Structure-Guided Medical Vision-Language Pre-training
by: Li, Qingqiu, et al.
Published: (2024) -
Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation
by: Zhang, Dingwen, et al.
Published: (2024)