:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yuhan, Ma, Guoqing, Hao, Guangfu, Guo, Liangxuan, Chen, Yang, Yu, Shan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.05555
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language
by: Hao, Guangfu, et al.
Published: (2025)

Out-of-distribution forgetting: vulnerability of continual learning to intra-class distribution shift
by: Guo, Liangxuan, et al.
Published: (2023)

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
by: Guo, Qingpei, et al.
Published: (2024)

LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering
by: Chen, Yuhan, et al.
Published: (2024)

OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
by: Liu, Yanqing, et al.
Published: (2025)

3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
by: Yan, Siming, et al.
Published: (2023)

One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
by: Gao, Yuan, et al.
Published: (2025)

GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
by: Ma, Guoqing, et al.
Published: (2026)

Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)

Contrastive Pretraining with Dual Visual Encoders for Gloss-Free Sign Language Translation
by: Sincan, Ozge Mercanoglu, et al.
Published: (2025)

FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
by: Schäfer, Lukas, et al.
Published: (2023)

Pretrained Reversible Generation as Unsupervised Visual Representation Learning
by: Xue, Rongkun, et al.
Published: (2024)

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
by: Chen, Yuxin, et al.
Published: (2024)

AlignTok: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
by: Chen, Bowei, et al.
Published: (2025)

Object-Centric Pretraining via Target Encoder Bootstrapping
by: Đukić, Nikola, et al.
Published: (2025)

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
by: Chen, Chaofeng, et al.
Published: (2023)

Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
by: Zhang, Yuan, et al.
Published: (2026)

Granulon: Awakening Pixel-Level Visual Encoders with Adaptive Multi-Granularity Semantics for MLLM
by: Mao, Junyuan, et al.
Published: (2026)

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
by: Chen, Yang, et al.
Published: (2025)

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
by: Su, Zhaochen, et al.
Published: (2025)

Efficient Image Synthesis with Sphere Latent Encoder
by: Do, Tung, et al.
Published: (2026)

GSE: Evaluating Sticker Visual Semantic Similarity via a General Sticker Encoder
by: Chee, Heng Er Metilda, et al.
Published: (2025)

Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
by: Zhang, Xuyang, et al.
Published: (2025)

Negative Prototypes Guided Contrastive Learning for WSOD
by: Zhang, Yu, et al.
Published: (2024)

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)

A Cascaded Information Interaction Network for Precise Image Segmentation
by: Xiao, Hewen, et al.
Published: (2026)

Hierarchical Feature Learning for Medical Point Clouds via State Space Model
by: Zhang, Guoqing, et al.
Published: (2025)

Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
by: Zhang, Rui, et al.
Published: (2023)

Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation
by: Chen, Chen, et al.
Published: (2024)

BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder
by: Huang, Siquan, et al.
Published: (2026)

IPCV: Information-Preserving Compression for MLLM Visual Encoders
by: Chen, Yuan, et al.
Published: (2025)

Prompt-DAS: Annotation-Efficient Prompt Learning for Domain Adaptive Semantic Segmentation of Electron Microscopy Images
by: Chen, Jiabao, et al.
Published: (2025)

Implicit Counterfactual Learning for Audio-Visual Segmentation
by: Zha, Mingfeng, et al.
Published: (2025)

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
by: Yang, Chuanguang, et al.
Published: (2025)

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
by: Tang, Zhicong, et al.
Published: (2023)

Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning
by: Li, Yiheng, et al.
Published: (2025)

MagicFuse: Single Image Fusion for Visual and Semantic Reinforcement
by: Zhang, Hao, et al.
Published: (2026)

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
by: Wang, Zhongqi, et al.
Published: (2025)

EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
by: Ma, Mingjie, et al.
Published: (2024)