Saved in:
| Main Authors: | Zhang, Yuhan, Ma, Guoqing, Hao, Guangfu, Guo, Liangxuan, Chen, Yang, Yu, Shan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.05555 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language
by: Hao, Guangfu, et al.
Published: (2025)
by: Hao, Guangfu, et al.
Published: (2025)
Out-of-distribution forgetting: vulnerability of continual learning to intra-class distribution shift
by: Guo, Liangxuan, et al.
Published: (2023)
by: Guo, Liangxuan, et al.
Published: (2023)
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
by: Guo, Qingpei, et al.
Published: (2024)
by: Guo, Qingpei, et al.
Published: (2024)
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering
by: Chen, Yuhan, et al.
Published: (2024)
by: Chen, Yuhan, et al.
Published: (2024)
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
by: Liu, Yanqing, et al.
Published: (2025)
by: Liu, Yanqing, et al.
Published: (2025)
3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
by: Yan, Siming, et al.
Published: (2023)
by: Yan, Siming, et al.
Published: (2023)
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
by: Gao, Yuan, et al.
Published: (2025)
by: Gao, Yuan, et al.
Published: (2025)
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
by: Ma, Guoqing, et al.
Published: (2026)
by: Ma, Guoqing, et al.
Published: (2026)
Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)
by: Guo, Guangfu, et al.
Published: (2026)
Contrastive Pretraining with Dual Visual Encoders for Gloss-Free Sign Language Translation
by: Sincan, Ozge Mercanoglu, et al.
Published: (2025)
by: Sincan, Ozge Mercanoglu, et al.
Published: (2025)
FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)
by: Yuan, Zheng, et al.
Published: (2024)
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
by: Schäfer, Lukas, et al.
Published: (2023)
by: Schäfer, Lukas, et al.
Published: (2023)
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
by: Xue, Rongkun, et al.
Published: (2024)
by: Xue, Rongkun, et al.
Published: (2024)
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
by: Chen, Yuxin, et al.
Published: (2024)
by: Chen, Yuxin, et al.
Published: (2024)
AlignTok: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
by: Chen, Bowei, et al.
Published: (2025)
by: Chen, Bowei, et al.
Published: (2025)
Object-Centric Pretraining via Target Encoder Bootstrapping
by: Đukić, Nikola, et al.
Published: (2025)
by: Đukić, Nikola, et al.
Published: (2025)
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
by: Chen, Chaofeng, et al.
Published: (2023)
by: Chen, Chaofeng, et al.
Published: (2023)
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
by: Zhang, Yuan, et al.
Published: (2026)
by: Zhang, Yuan, et al.
Published: (2026)
Granulon: Awakening Pixel-Level Visual Encoders with Adaptive Multi-Granularity Semantics for MLLM
by: Mao, Junyuan, et al.
Published: (2026)
by: Mao, Junyuan, et al.
Published: (2026)
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
by: Su, Zhaochen, et al.
Published: (2025)
by: Su, Zhaochen, et al.
Published: (2025)
Efficient Image Synthesis with Sphere Latent Encoder
by: Do, Tung, et al.
Published: (2026)
by: Do, Tung, et al.
Published: (2026)
GSE: Evaluating Sticker Visual Semantic Similarity via a General Sticker Encoder
by: Chee, Heng Er Metilda, et al.
Published: (2025)
by: Chee, Heng Er Metilda, et al.
Published: (2025)
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
by: Zhang, Xuyang, et al.
Published: (2025)
by: Zhang, Xuyang, et al.
Published: (2025)
Negative Prototypes Guided Contrastive Learning for WSOD
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)
by: Huang, Yixu, et al.
Published: (2026)
A Cascaded Information Interaction Network for Precise Image Segmentation
by: Xiao, Hewen, et al.
Published: (2026)
by: Xiao, Hewen, et al.
Published: (2026)
Hierarchical Feature Learning for Medical Point Clouds via State Space Model
by: Zhang, Guoqing, et al.
Published: (2025)
by: Zhang, Guoqing, et al.
Published: (2025)
Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
by: Zhang, Rui, et al.
Published: (2023)
by: Zhang, Rui, et al.
Published: (2023)
Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder
by: Huang, Siquan, et al.
Published: (2026)
by: Huang, Siquan, et al.
Published: (2026)
IPCV: Information-Preserving Compression for MLLM Visual Encoders
by: Chen, Yuan, et al.
Published: (2025)
by: Chen, Yuan, et al.
Published: (2025)
Prompt-DAS: Annotation-Efficient Prompt Learning for Domain Adaptive Semantic Segmentation of Electron Microscopy Images
by: Chen, Jiabao, et al.
Published: (2025)
by: Chen, Jiabao, et al.
Published: (2025)
Implicit Counterfactual Learning for Audio-Visual Segmentation
by: Zha, Mingfeng, et al.
Published: (2025)
by: Zha, Mingfeng, et al.
Published: (2025)
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
by: Yang, Chuanguang, et al.
Published: (2025)
by: Yang, Chuanguang, et al.
Published: (2025)
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
by: Tang, Zhicong, et al.
Published: (2023)
by: Tang, Zhicong, et al.
Published: (2023)
Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning
by: Li, Yiheng, et al.
Published: (2025)
by: Li, Yiheng, et al.
Published: (2025)
MagicFuse: Single Image Fusion for Visual and Semantic Reinforcement
by: Zhang, Hao, et al.
Published: (2026)
by: Zhang, Hao, et al.
Published: (2026)
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
by: Wang, Zhongqi, et al.
Published: (2025)
by: Wang, Zhongqi, et al.
Published: (2025)
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
by: Ma, Mingjie, et al.
Published: (2024)
by: Ma, Mingjie, et al.
Published: (2024)
Similar Items
-
Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language
by: Hao, Guangfu, et al.
Published: (2025) -
Out-of-distribution forgetting: vulnerability of continual learning to intra-class distribution shift
by: Guo, Liangxuan, et al.
Published: (2023) -
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
by: Guo, Qingpei, et al.
Published: (2024) -
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering
by: Chen, Yuhan, et al.
Published: (2024) -
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
by: Liu, Yanqing, et al.
Published: (2025)