Saved in:
| Main Authors: | Herzog, Jonas, Wang, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.16100 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
by: Mistretta, Marco, et al.
Published: (2025)
by: Mistretta, Marco, et al.
Published: (2025)
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
by: Herzog, Jonas
Published: (2024)
by: Herzog, Jonas
Published: (2024)
CLIP Adaptation by Intra-modal Overlap Reduction
by: Kravets, Alexey, et al.
Published: (2024)
by: Kravets, Alexey, et al.
Published: (2024)
Extract Free Dense Misalignment from CLIP
by: Nam, JeongYeon, et al.
Published: (2024)
by: Nam, JeongYeon, et al.
Published: (2024)
FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
by: Chen, Yulin, et al.
Published: (2025)
by: Chen, Yulin, et al.
Published: (2025)
IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment
by: Magistri, Simone, et al.
Published: (2026)
by: Magistri, Simone, et al.
Published: (2026)
Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP
by: Niss, Laura, et al.
Published: (2024)
by: Niss, Laura, et al.
Published: (2024)
Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
IntraStyler: Intra-Domain Style Synthesis for Cross-Modality MRI Domain Adaptation
by: Liu, Han, et al.
Published: (2026)
by: Liu, Han, et al.
Published: (2026)
Pic@Point: Cross-Modal Learning by Local and Global Point-Picture Correspondence
by: Herzog, Vencia, et al.
Published: (2024)
by: Herzog, Vencia, et al.
Published: (2024)
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
by: Cai, Yichao, et al.
Published: (2025)
by: Cai, Yichao, et al.
Published: (2025)
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
by: Wang, Dianyi, et al.
Published: (2025)
by: Wang, Dianyi, et al.
Published: (2025)
CLIP-Free, Label Free, Unsupervised Concept Bottleneck Models
by: Sammani, Fawaz, et al.
Published: (2025)
by: Sammani, Fawaz, et al.
Published: (2025)
Enhancing CLIP Robustness via Cross-Modality Alignment
by: Zhu, Xingyu, et al.
Published: (2025)
by: Zhu, Xingyu, et al.
Published: (2025)
Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies
by: Tian, Mulin, et al.
Published: (2023)
by: Tian, Mulin, et al.
Published: (2023)
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
by: Zhang, Le, et al.
Published: (2023)
by: Zhang, Le, et al.
Published: (2023)
Robust Self-Supervised Cross-Modal Super-Resolution against Real-World Misaligned Observations
by: Dong, Xiaoyu, et al.
Published: (2026)
by: Dong, Xiaoyu, et al.
Published: (2026)
CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification
by: Tan, Xuan, et al.
Published: (2024)
by: Tan, Xuan, et al.
Published: (2024)
MMLGNet: Cross-Modal Alignment of Remote Sensing Data using CLIP
by: Chaudhary, Aditya, et al.
Published: (2026)
by: Chaudhary, Aditya, et al.
Published: (2026)
Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning
by: Chen, Haoran, et al.
Published: (2025)
by: Chen, Haoran, et al.
Published: (2025)
Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization
by: Leng, Jixuan, et al.
Published: (2023)
by: Leng, Jixuan, et al.
Published: (2023)
SuperCLIP: CLIP with Simple Classification Supervision
by: Zhao, Weiheng, et al.
Published: (2025)
by: Zhao, Weiheng, et al.
Published: (2025)
Integrating MedCLIP and Cross-Modal Fusion for Automatic Radiology Report Generation
by: Han, Qianhao, et al.
Published: (2024)
by: Han, Qianhao, et al.
Published: (2024)
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
by: Jeong, Sungheon, et al.
Published: (2024)
by: Jeong, Sungheon, et al.
Published: (2024)
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
by: Li, Fan, et al.
Published: (2025)
by: Li, Fan, et al.
Published: (2025)
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)
by: Gao, Bin-Bin, et al.
Published: (2025)
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
by: Wang, Yue, et al.
Published: (2025)
by: Wang, Yue, et al.
Published: (2025)
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
by: Huang, Weiquan, et al.
Published: (2024)
by: Huang, Weiquan, et al.
Published: (2024)
CLIP4VI-ReID: Learning Modality-shared Representations via CLIP Semantic Bridge for Visible-Infrared Person Re-identification
by: Yang, Xiaomei, et al.
Published: (2025)
by: Yang, Xiaomei, et al.
Published: (2025)
Long-CLIP: Unlocking the Long-Text Capability of CLIP
by: Zhang, Beichen, et al.
Published: (2024)
by: Zhang, Beichen, et al.
Published: (2024)
IPAD-CLIP: Teaching CLIP to Detect Image Local Perceptual Artifacts
by: Wang, Juan, et al.
Published: (2026)
by: Wang, Juan, et al.
Published: (2026)
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
by: Wang, Jingyun, et al.
Published: (2024)
by: Wang, Jingyun, et al.
Published: (2024)
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
by: Zhu, Wencheng, et al.
Published: (2025)
by: Zhu, Wencheng, et al.
Published: (2025)
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
by: Sun, Quan, et al.
Published: (2024)
by: Sun, Quan, et al.
Published: (2024)
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
by: Yi, Chao, et al.
Published: (2024)
by: Yi, Chao, et al.
Published: (2024)
WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual Art
by: Ghildyal, Abhijay, et al.
Published: (2025)
by: Ghildyal, Abhijay, et al.
Published: (2025)
CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images
by: Sun, Xinjie, et al.
Published: (2024)
by: Sun, Xinjie, et al.
Published: (2024)
The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models
by: Niss, Laura, et al.
Published: (2024)
by: Niss, Laura, et al.
Published: (2024)
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
by: Su, Dayong, et al.
Published: (2025)
by: Su, Dayong, et al.
Published: (2025)
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
by: Lan, Mengcheng, et al.
Published: (2024)
by: Lan, Mengcheng, et al.
Published: (2024)
Similar Items
-
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
by: Mistretta, Marco, et al.
Published: (2025) -
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
by: Herzog, Jonas
Published: (2024) -
CLIP Adaptation by Intra-modal Overlap Reduction
by: Kravets, Alexey, et al.
Published: (2024) -
Extract Free Dense Misalignment from CLIP
by: Nam, JeongYeon, et al.
Published: (2024) -
FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
by: Chen, Yulin, et al.
Published: (2025)