Saved in:
| Main Authors: | Yang, Lihe, Li, Shang-Wen, Li, Yang, Lei, Xinjie, Wang, Dong, Mohamed, Abdelrahman, Zhao, Hengshuang, Xu, Hu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.15715 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
by: Yang, Lihe, et al.
Published: (2024)
by: Yang, Lihe, et al.
Published: (2024)
A Lightweight Clustering Framework for Unsupervised Semantic Segmentation
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023)
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023)
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
by: Yang, Lihe, et al.
Published: (2024)
by: Yang, Lihe, et al.
Published: (2024)
Depth Anything V2
by: Yang, Lihe, et al.
Published: (2024)
by: Yang, Lihe, et al.
Published: (2024)
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
by: Zhang, Zheng, et al.
Published: (2024)
by: Zhang, Zheng, et al.
Published: (2024)
Depth Anything with Any Prior
by: Wang, Zehan, et al.
Published: (2025)
by: Wang, Zehan, et al.
Published: (2025)
Osprey: Pixel Understanding with Visual Instruction Tuning
by: Yuan, Yuqian, et al.
Published: (2023)
by: Yuan, Yuqian, et al.
Published: (2023)
There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training
by: Lei, Jiachen, et al.
Published: (2025)
by: Lei, Jiachen, et al.
Published: (2025)
PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
by: Wang, Jie, et al.
Published: (2025)
by: Wang, Jie, et al.
Published: (2025)
Formula-Supervised Visual-Geometric Pre-training
by: Yamada, Ryosuke, et al.
Published: (2024)
by: Yamada, Ryosuke, et al.
Published: (2024)
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
by: Tang, Longxiang, et al.
Published: (2024)
by: Tang, Longxiang, et al.
Published: (2024)
MedFILIP: Medical Fine-grained Language-Image Pre-training
by: Liang, Xinjie, et al.
Published: (2025)
by: Liang, Xinjie, et al.
Published: (2025)
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
by: Yang, Honghui, et al.
Published: (2023)
by: Yang, Honghui, et al.
Published: (2023)
Split Adaptation for Pre-trained Vision Transformers
by: Wang, Lixu, et al.
Published: (2025)
by: Wang, Lixu, et al.
Published: (2025)
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
by: Wang, Shumin, et al.
Published: (2025)
by: Wang, Shumin, et al.
Published: (2025)
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
by: Su, Tongkun, et al.
Published: (2024)
by: Su, Tongkun, et al.
Published: (2024)
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
by: He, Yili, et al.
Published: (2025)
by: He, Yili, et al.
Published: (2025)
MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection
by: Li, Jianan, et al.
Published: (2024)
by: Li, Jianan, et al.
Published: (2024)
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving
by: Xu, Shaoqing, et al.
Published: (2024)
by: Xu, Shaoqing, et al.
Published: (2024)
GDRO: Group-level Reward Post-training Suitable for Diffusion Models
by: Wang, Yiyang, et al.
Published: (2026)
by: Wang, Yiyang, et al.
Published: (2026)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
by: Zhu, Haoyi, et al.
Published: (2023)
by: Zhu, Haoyi, et al.
Published: (2023)
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
by: Long, Rujiao, et al.
Published: (2024)
by: Long, Rujiao, et al.
Published: (2024)
4D Visual Pre-training for Robot Learning
by: Hou, Chengkai, et al.
Published: (2025)
by: Hou, Chengkai, et al.
Published: (2025)
Controlling the Latent Diffusion Model for Generative Image Shadow Removal via Residual Generation
by: Li, Xinjie, et al.
Published: (2024)
by: Li, Xinjie, et al.
Published: (2024)
PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
by: Xie, Yin, et al.
Published: (2025)
by: Xie, Yin, et al.
Published: (2025)
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
by: Haliassos, Alexandros, et al.
Published: (2024)
by: Haliassos, Alexandros, et al.
Published: (2024)
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
by: Zhao, Jiahe, et al.
Published: (2025)
by: Zhao, Jiahe, et al.
Published: (2025)
Micro-Expression Recognition by Motion Feature Extraction based on Pre-training
by: Li, Ruolin, et al.
Published: (2024)
by: Li, Ruolin, et al.
Published: (2024)
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
by: He, Xiaoxuan, et al.
Published: (2023)
by: He, Xiaoxuan, et al.
Published: (2023)
Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
by: Jin, Yang, et al.
Published: (2024)
by: Jin, Yang, et al.
Published: (2024)
Efficient Transferability Assessment for Selection of Pre-trained Detectors
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
DyArtbank: Diverse Artistic Style Transfer via Pre-trained Stable Diffusion and Dynamic Style Prompt Artbank
by: Zhang, Zhanjie, et al.
Published: (2025)
by: Zhang, Zhanjie, et al.
Published: (2025)
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023)
by: Yuan, Zhihao, et al.
Published: (2023)
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
by: Gao, Zuan, et al.
Published: (2024)
by: Gao, Zuan, et al.
Published: (2024)
Pixel-Perfect Visual Geometry Estimation
by: Xu, Gangwei, et al.
Published: (2026)
by: Xu, Gangwei, et al.
Published: (2026)
Visual Spatial Tuning
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
by: Wang, Zhenyu, et al.
Published: (2024)
by: Wang, Zhenyu, et al.
Published: (2024)
SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
by: Zhang, Zhanjie, et al.
Published: (2025)
by: Zhang, Zhanjie, et al.
Published: (2025)
Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization
by: Li, Songlin, et al.
Published: (2025)
by: Li, Songlin, et al.
Published: (2025)
Scaling up Multimodal Pre-training for Sign Language Understanding
by: Zhou, Wengang, et al.
Published: (2024)
by: Zhou, Wengang, et al.
Published: (2024)
Similar Items
-
UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
by: Yang, Lihe, et al.
Published: (2024) -
A Lightweight Clustering Framework for Unsupervised Semantic Segmentation
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023) -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
by: Yang, Lihe, et al.
Published: (2024) -
Depth Anything V2
by: Yang, Lihe, et al.
Published: (2024) -
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
by: Zhang, Zheng, et al.
Published: (2024)