:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yi, Chao, He, Yu-Hang, Zhan, De-Chuan, Ye, Han-Jia
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.13797
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning without Forgetting for Vision-Language Models
by: Zhou, Da-Wei, et al.
Published: (2023)

BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning
by: Li, Lan, et al.
Published: (2025)

PILOT: A Pre-Trained Model-Based Continual Learning Toolbox
by: Sun, Hai-Long, et al.
Published: (2023)

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
by: Zhou, Da-Wei, et al.
Published: (2024)

Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts
by: Li, Lan, et al.
Published: (2025)

Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification
by: Li, Yiqiao, et al.
Published: (2026)

Continual Learning with Pre-Trained Models: A Survey
by: Zhou, Da-Wei, et al.
Published: (2024)

Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
by: Zhou, Da-Wei, et al.
Published: (2024)

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
by: Zhou, Da-Wei, et al.
Published: (2023)

TV100: A TV Series Dataset that Pre-Trained CLIP Has Not Seen
by: Zhou, Da-Wei, et al.
Published: (2024)

Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation
by: Chen, Zhi-Kai, et al.
Published: (2025)

MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning
by: Sun, Hai-Long, et al.
Published: (2024)

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
by: Yi, Chao, et al.
Published: (2024)

Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning
by: Qi, Zhi-Hong, et al.
Published: (2024)

OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions
by: Zhang, Yi-Kai, et al.
Published: (2024)

External Knowledge Injection for CLIP-Based Class-Incremental Learning
by: Zhou, Da-Wei, et al.
Published: (2025)

Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
by: Schrodi, Simon, et al.
Published: (2024)

Class-Incremental Learning: A Survey
by: Zhou, Da-Wei, et al.
Published: (2023)

Déjà Vu Memorization in Vision-Language Models
by: Jayaraman, Bargav, et al.
Published: (2024)

Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
by: Naharas, Nilay, et al.
Published: (2025)

X-VILA: Cross-Modality Alignment for Large Language Model
by: Ye, Hanrong, et al.
Published: (2024)

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
by: Huang, Wenxuan, et al.
Published: (2025)

Quantifying Cross-Modality Memorization in Vision-Language Models
by: Wen, Yuxin, et al.
Published: (2025)

Improved Alignment of Modalities in Large Vision Language Models
by: Jangra, Kartik, et al.
Published: (2025)

Rethinking Fine-Tuning: Unlocking Hidden Capabilities in Vision-Language Models
by: Zhang, Mingyuan, et al.
Published: (2025)

Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
by: Hu, Rui, et al.
Published: (2025)

MMRL: Multi-Modal Representation Learning for Vision-Language Models
by: Guo, Yuncheng, et al.
Published: (2025)

Information Router for Mitigating Modality Dominance in Vision-Language Models
by: Kim, Seulgi, et al.
Published: (2026)

Cross-Modal Adapter: Parameter-Efficient Transfer Learning Approach for Vision-Language Models
by: Yang, Juncheng, et al.
Published: (2024)

Bridging the Gap Between Multimodal Foundation Models and World Models
by: He, Xuehai
Published: (2025)

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
by: Xu, Zhongxing, et al.
Published: (2024)

Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection
by: Bao, Wenxuan, et al.
Published: (2026)

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training
by: Zhang, Wenyu, et al.
Published: (2024)

Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning
by: Ye, Bo, et al.
Published: (2023)

Tactile Modality Fusion for Vision-Language-Action Models
by: Morissette, Charlotte, et al.
Published: (2026)

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
by: Su, Hung-Ting, et al.
Published: (2024)

Bridging Vision and Language Spaces with Assignment Prediction
by: Park, Jungin, et al.
Published: (2024)

AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
by: Wang, Yucen, et al.
Published: (2024)

Multi-Modal Adapter for Vision-Language Models
by: Seputis, Dominykas, et al.
Published: (2024)

Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)