Saved in:
| Main Authors: | Zhao, Chenyang, Wang, Kun, Hsiao, Janet H., Chan, Antoni B. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.18816 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Debunking Grad-ECLIP: A Comprehensive Study on Its Incorrectness and Fundamental Principles for Model Interpretation
by: Cui, Yongjin, et al.
Published: (2026)
by: Cui, Yongjin, et al.
Published: (2026)
Density-based Object Detection in Crowded Scenes
by: Zhao, Chenyang, et al.
Published: (2025)
by: Zhao, Chenyang, et al.
Published: (2025)
Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
by: Lin, Wei, et al.
Published: (2025)
by: Lin, Wei, et al.
Published: (2025)
Probing CLIP's Comprehension of 360-Degree Textual and Visual Semantics
by: Wang, Hai, et al.
Published: (2026)
by: Wang, Hai, et al.
Published: (2026)
FG-CLIP: Fine-Grained Visual and Textual Alignment
by: Xie, Chunyu, et al.
Published: (2025)
by: Xie, Chunyu, et al.
Published: (2025)
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
by: Yang, Tianyu, et al.
Published: (2024)
by: Yang, Tianyu, et al.
Published: (2024)
MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection
by: Park, Jun Yeong, et al.
Published: (2026)
by: Park, Jun Yeong, et al.
Published: (2026)
Guided AbsoluteGrad: Magnitude of Gradients Matters to Explanation's Localization and Saliency
by: Huang, Jun, et al.
Published: (2024)
by: Huang, Jun, et al.
Published: (2024)
MLLM-based Textual Explanations for Face Comparison
by: Sony, Redwan, et al.
Published: (2026)
by: Sony, Redwan, et al.
Published: (2026)
CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting
by: Yu, Qinkai, et al.
Published: (2024)
by: Yu, Qinkai, et al.
Published: (2024)
3D Crowd Counting via Geometric Attention-guided Multi-View Fusion
by: Zhang, Qi, et al.
Published: (2020)
by: Zhang, Qi, et al.
Published: (2020)
Learning Tracking Representations from Single Point Annotations
by: Wu, Qiangqiang, et al.
Published: (2024)
by: Wu, Qiangqiang, et al.
Published: (2024)
A Fixed-Point Approach to Unified Prompt-Based Counting
by: Lin, Wei, et al.
Published: (2024)
by: Lin, Wei, et al.
Published: (2024)
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
by: Wang, Jiuniu, et al.
Published: (2025)
by: Wang, Jiuniu, et al.
Published: (2025)
Harnessing Textual Semantic Priors for Knowledge Transfer and Refinement in CLIP-Driven Continual Learning
by: He, Lingfeng, et al.
Published: (2025)
by: He, Lingfeng, et al.
Published: (2025)
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
by: Yang, Ziyan, et al.
Published: (2022)
by: Yang, Ziyan, et al.
Published: (2022)
un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
by: Li, Yinqi, et al.
Published: (2025)
by: Li, Yinqi, et al.
Published: (2025)
Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual Art
by: Ghildyal, Abhijay, et al.
Published: (2025)
by: Ghildyal, Abhijay, et al.
Published: (2025)
Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors
by: Lu, Haodong, et al.
Published: (2025)
by: Lu, Haodong, et al.
Published: (2025)
Grad-CL: Source Free Domain Adaptation with Gradient Guided Feature Disalignment
by: Thakur, Rini Smita, et al.
Published: (2025)
by: Thakur, Rini Smita, et al.
Published: (2025)
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2023)
by: Xiao, Linhui, et al.
Published: (2023)
Is CLIP Cross-Eyed? Revealing and Mitigating Center Bias in the CLIP Family
by: Chew, Oscar, et al.
Published: (2026)
by: Chew, Oscar, et al.
Published: (2026)
Zero-Shot Textual Explanations via Translating Decision-Critical Features
by: Yamauchi, Toshinori, et al.
Published: (2025)
by: Yamauchi, Toshinori, et al.
Published: (2025)
SuperCLIP: CLIP with Simple Classification Supervision
by: Zhao, Weiheng, et al.
Published: (2025)
by: Zhao, Weiheng, et al.
Published: (2025)
VisText-Mosquito: A Unified Multimodal Dataset for Visual Detection, Segmentation, and Textual Explanation on Mosquito Breeding Sites
by: Islam, Md. Adnanul, et al.
Published: (2025)
by: Islam, Md. Adnanul, et al.
Published: (2025)
Zero-Shot Faithful Textual Explanations via Directional-Derivative Influence on Predictions
by: Yamauchi, Toshinori, et al.
Published: (2026)
by: Yamauchi, Toshinori, et al.
Published: (2026)
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
by: Rusanovsky, Matan, et al.
Published: (2024)
by: Rusanovsky, Matan, et al.
Published: (2024)
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
by: Yin, Yuanyang, et al.
Published: (2024)
by: Yin, Yuanyang, et al.
Published: (2024)
CLIP Model for Images to Textual Prompts Based on Top-k Neighbors
by: Zhang, Xin, et al.
Published: (2024)
by: Zhang, Xin, et al.
Published: (2024)
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
by: Liu, Jingming, et al.
Published: (2024)
by: Liu, Jingming, et al.
Published: (2024)
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)
by: Gao, Bin-Bin, et al.
Published: (2025)
Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting
by: Kayser, Maxime, et al.
Published: (2024)
by: Kayser, Maxime, et al.
Published: (2024)
MIP: CLIP-based Image Reconstruction from PEFT Gradients
by: Zhou, Peiheng, et al.
Published: (2024)
by: Zhou, Peiheng, et al.
Published: (2024)
Visual and Textual Prompts in VLLMs for Enhancing Emotion Recognition
by: Wang, Zhifeng, et al.
Published: (2025)
by: Wang, Zhifeng, et al.
Published: (2025)
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
by: Vardi, Ben, et al.
Published: (2025)
by: Vardi, Ben, et al.
Published: (2025)
GIFT: A Framework Towards Global Interpretable Faithful Textual Explanations of Vision Classifiers
by: Zablocki, Éloi, et al.
Published: (2024)
by: Zablocki, Éloi, et al.
Published: (2024)
Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations
by: Dekdegue, Hajar, et al.
Published: (2026)
by: Dekdegue, Hajar, et al.
Published: (2026)
An Experimental Study on Generating Plausible Textual Explanations for Video Summarization
by: Eleftheriadis, Thomas, et al.
Published: (2025)
by: Eleftheriadis, Thomas, et al.
Published: (2025)
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
by: Yao, Hantao, et al.
Published: (2023)
by: Yao, Hantao, et al.
Published: (2023)
Similar Items
-
Debunking Grad-ECLIP: A Comprehensive Study on Its Incorrectness and Fundamental Principles for Model Interpretation
by: Cui, Yongjin, et al.
Published: (2026) -
Density-based Object Detection in Crowded Scenes
by: Zhao, Chenyang, et al.
Published: (2025) -
Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
by: Lin, Wei, et al.
Published: (2025) -
Probing CLIP's Comprehension of 360-Degree Textual and Visual Semantics
by: Wang, Hai, et al.
Published: (2026) -
FG-CLIP: Fine-Grained Visual and Textual Alignment
by: Xie, Chunyu, et al.
Published: (2025)