:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Lihe, Li, Shang-Wen, Li, Yang, Lei, Xinjie, Wang, Dong, Mohamed, Abdelrahman, Zhao, Hengshuang, Xu, Hu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.15715
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
by: Yang, Lihe, et al.
Published: (2024)

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023)

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
by: Yang, Lihe, et al.
Published: (2024)

Depth Anything V2
by: Yang, Lihe, et al.
Published: (2024)

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
by: Zhang, Zheng, et al.
Published: (2024)

Depth Anything with Any Prior
by: Wang, Zehan, et al.
Published: (2025)

Osprey: Pixel Understanding with Visual Instruction Tuning
by: Yuan, Yuqian, et al.
Published: (2023)

There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training
by: Lei, Jiachen, et al.
Published: (2025)

PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
by: Wang, Jie, et al.
Published: (2025)

Formula-Supervised Visual-Geometric Pre-training
by: Yamada, Ryosuke, et al.
Published: (2024)

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
by: Tang, Longxiang, et al.
Published: (2024)

MedFILIP: Medical Fine-grained Language-Image Pre-training
by: Liang, Xinjie, et al.
Published: (2025)

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
by: Yang, Honghui, et al.
Published: (2023)

Split Adaptation for Pre-trained Vision Transformers
by: Wang, Lixu, et al.
Published: (2025)

Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
by: Wang, Shumin, et al.
Published: (2025)

Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
by: Su, Tongkun, et al.
Published: (2024)

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
by: He, Yili, et al.
Published: (2025)

MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection
by: Li, Jianan, et al.
Published: (2024)

GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving
by: Xu, Shaoqing, et al.
Published: (2024)

GDRO: Group-level Reward Post-training Suitable for Diffusion Models
by: Wang, Yiyang, et al.
Published: (2026)

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
by: Zhu, Haoyi, et al.
Published: (2023)

HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
by: Long, Rujiao, et al.
Published: (2024)

4D Visual Pre-training for Robot Learning
by: Hou, Chengkai, et al.
Published: (2025)

Controlling the Latent Diffusion Model for Generative Image Shadow Removal via Residual Generation
by: Li, Xinjie, et al.
Published: (2024)

PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
by: Xie, Yin, et al.
Published: (2025)

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
by: Haliassos, Alexandros, et al.
Published: (2024)

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
by: Zhao, Jiahe, et al.
Published: (2025)

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training
by: Li, Ruolin, et al.
Published: (2024)

Unified Medical Image Pre-training in Language-Guided Common Semantic Space
by: He, Xiaoxuan, et al.
Published: (2023)

Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
by: Jin, Yang, et al.
Published: (2024)

Efficient Transferability Assessment for Selection of Pre-trained Detectors
by: Wang, Zhao, et al.
Published: (2024)

DyArtbank: Diverse Artistic Style Transfer via Pre-trained Stable Diffusion and Dynamic Style Prompt Artbank
by: Zhang, Zhanjie, et al.
Published: (2025)

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023)

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
by: Gao, Zuan, et al.
Published: (2024)

Pixel-Perfect Visual Geometry Estimation
by: Xu, Gangwei, et al.
Published: (2026)

Visual Spatial Tuning
by: Yang, Rui, et al.
Published: (2025)

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
by: Wang, Zhenyu, et al.
Published: (2024)

SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
by: Zhang, Zhanjie, et al.
Published: (2025)

Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization
by: Li, Songlin, et al.
Published: (2025)

Scaling up Multimodal Pre-training for Sign Language Understanding
by: Zhou, Wengang, et al.
Published: (2024)