:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Ruoyu, Wang, Lulu, He, Yi, Pan, Tongling, Yu, Zhengtao, Li, Yingna
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.11024
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification
by: Wang, Qijie, et al.
Published: (2024)

MeaCap: Memory-Augmented Zero-shot Image Captioning
by: Zeng, Zequn, et al.
Published: (2024)

Negative Entity Suppression for Zero-Shot Captioning with Synthetic Images
by: Lu, Zimao, et al.
Published: (2025)

Image-Caption Encoding for Improving Zero-Shot Generalization
by: Yu, Eric Yang, et al.
Published: (2024)

MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
by: Xu, Chaoran, et al.
Published: (2026)

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
by: Luo, Jianjie, et al.
Published: (2024)

DeltaDeno: Zero-Shot Anomaly Generation via Delta-Denoising Attribution
by: Xu, Chaoran, et al.
Published: (2025)

Synthetic Captions for Open-Vocabulary Zero-Shot Segmentation
by: Lebailly, Tim, et al.
Published: (2025)

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
by: Bianchi, Lorenzo, et al.
Published: (2025)

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
by: Qu, Zhen, et al.
Published: (2025)

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
by: Zhang, Guanghao, et al.
Published: (2025)

Are Image-to-Video Models Good Zero-Shot Image Editors?
by: Zhang, Zechuan, et al.
Published: (2025)

Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
by: Fonseca, Rui, et al.
Published: (2025)

FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach
by: Rani, Anju, et al.
Published: (2025)

MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)

MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement
by: Wang, Yingying, et al.
Published: (2025)

Double Banking on Knowledge: Customized Modulation and Prototypes for Multi-Modality Semi-supervised Medical Image Segmentation
by: Chen, Yingyu, et al.
Published: (2024)

Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding
by: Ding, Ning, et al.
Published: (2025)

RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning
by: Ma, Yunchuan, et al.
Published: (2024)

CoPS: Conditional Prompt Synthesis for Zero-Shot Anomaly Detection
by: Chen, Qiyu, et al.
Published: (2025)

Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning
by: Wu, Xiangyu, et al.
Published: (2025)

MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification
by: Qu, Xiangyan, et al.
Published: (2025)

SGCap: Decoding Semantic Group for Zero-shot Video Captioning
by: Pan, Zeyu, et al.
Published: (2025)

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025)

SegTTA: Training-Free Test-Time Augmentation for Zero-Shot Medical Imaging Segmentation
by: Yao, Yihong, et al.
Published: (2026)

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)

CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
by: Zhang, Mingkun, et al.
Published: (2025)

The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
by: Tian, Mingkai, et al.
Published: (2025)

Unlocking Zero-Shot Plant Segmentation with Pl@ntNet Intelligence
by: Ravé, Simon, et al.
Published: (2025)

Multi-turn Physics-informed Vision-language Model for Physics-grounded Anomaly Detection
by: Gu, Yao, et al.
Published: (2026)

DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
by: Wu, Hao, et al.
Published: (2024)

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
by: Zhang, Lin, et al.
Published: (2025)

DreamInsert: Zero-Shot Image-to-Video Object Insertion from A Single Image
by: Zhao, Qi, et al.
Published: (2025)

LongCaptioning: Unlocking the Power of Long Video Caption Generation in Large Multimodal Models
by: Wei, Hongchen, et al.
Published: (2025)

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models
by: Qu, Zhen, et al.
Published: (2026)

MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation
by: Caldeira, Eduarda, et al.
Published: (2025)

TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
by: Feinglass, Joshua, et al.
Published: (2024)

ZIM: Zero-Shot Image Matting for Anything
by: Kim, Beomyoung, et al.
Published: (2024)

CHITNet: A Complementary to Harmonious Information Transfer Network for Infrared and Visible Image Fusion
by: Du, Keying, et al.
Published: (2023)