:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Yucheng, Wang, Rui, Zhang, Chi, Hu, Juntao, Cheng, Pei, Fu, Bin, Zhang, Hanwang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.09162
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prompt-aligned Gradient for Prompt Tuning
by: Zhu, Beier, et al.
Published: (2022)

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025)

Dual-Modal Prompting for Sketch-Based Image Retrieval
by: Gao, Liying, et al.
Published: (2024)

EMMA: Efficient Visual Alignment in Multi-Modal LLMs
by: Ghazanfari, Sara, et al.
Published: (2024)

PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching
by: Nie, Han, et al.
Published: (2025)

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
by: Jha, Saurav, et al.
Published: (2024)

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
by: Yang, Xu, et al.
Published: (2023)

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
by: Jiao, Qirui, et al.
Published: (2025)

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
by: Zhu, Beier, et al.
Published: (2025)

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
by: Li, Hongyu, et al.
Published: (2024)

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
by: Yang, Hongji, et al.
Published: (2025)

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
by: Hu, Xiwei, et al.
Published: (2024)

Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)

Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
by: Zhang, Hao, et al.
Published: (2024)

One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks
by: Guo, Ji, et al.
Published: (2024)

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
by: Han, Yucheng, et al.
Published: (2024)

VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
by: Cheng, Silin, et al.
Published: (2025)

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
by: Xu, Katherine, et al.
Published: (2024)

Multi-Modal Prompt Learning on Blind Image Quality Assessment
by: Pan, Wensheng, et al.
Published: (2024)

TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
by: Jo, Sanghyun, et al.
Published: (2025)

Pretrained Image-Text Models are Secretly Video Captioners
by: Zhang, Chunhui, et al.
Published: (2025)

TextCraftor: Your Text Encoder Can be Image Quality Controller
by: Li, Yanyu, et al.
Published: (2024)

Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
by: Liu, Renyang, et al.
Published: (2025)

The CLIP Model is Secretly an Image-to-Prompt Converter
by: Ding, Yuxuan, et al.
Published: (2023)

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)

DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
by: Yang, Ruohong, et al.
Published: (2024)

Your Diffusion Model is Secretly a Certifiably Robust Classifier
by: Chen, Huanran, et al.
Published: (2024)

Your Pre-trained Diffusion Model Secretly Knows Restoration
by: Rajagopalan, Sudarshan, et al.
Published: (2026)

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image
by: Wang, Yuxuan, et al.
Published: (2025)

Disciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generation
by: Zhang, Chi, et al.
Published: (2026)

Image Super-Resolution with Text Prompt Diffusion
by: Chen, Zheng, et al.
Published: (2023)

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
by: Zhu, Beier, et al.
Published: (2023)

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
by: Zhang, Xu, et al.
Published: (2025)

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
by: Fei, Hao, et al.
Published: (2023)

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
by: Xia, Ruihao, et al.
Published: (2024)

Adaptively Clustering Neighbor Elements for Image-Text Generation
by: Wang, Zihua, et al.
Published: (2023)

Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated
by: Yang, Muli, et al.
Published: (2026)

Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
by: Chen, Xinyan, et al.
Published: (2023)

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
by: Zhu, Beier, et al.
Published: (2025)

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models
by: Li, Mingzhe, et al.
Published: (2025)