Saved in:
| Main Authors: | Han, Yucheng, Wang, Rui, Zhang, Chi, Hu, Juntao, Cheng, Pei, Fu, Bin, Zhang, Hanwang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09162 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prompt-aligned Gradient for Prompt Tuning
by: Zhu, Beier, et al.
Published: (2022)
by: Zhu, Beier, et al.
Published: (2022)
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025)
by: Hao, Yunzhuo, et al.
Published: (2025)
Dual-Modal Prompting for Sketch-Based Image Retrieval
by: Gao, Liying, et al.
Published: (2024)
by: Gao, Liying, et al.
Published: (2024)
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
by: Ghazanfari, Sara, et al.
Published: (2024)
by: Ghazanfari, Sara, et al.
Published: (2024)
PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching
by: Nie, Han, et al.
Published: (2025)
by: Nie, Han, et al.
Published: (2025)
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
by: Jha, Saurav, et al.
Published: (2024)
by: Jha, Saurav, et al.
Published: (2024)
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
by: Yang, Xu, et al.
Published: (2023)
by: Yang, Xu, et al.
Published: (2023)
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
by: Jiao, Qirui, et al.
Published: (2025)
by: Jiao, Qirui, et al.
Published: (2025)
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
by: Zhu, Beier, et al.
Published: (2025)
by: Zhu, Beier, et al.
Published: (2025)
Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
by: Li, Hongyu, et al.
Published: (2024)
by: Li, Hongyu, et al.
Published: (2024)
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
by: Yang, Hongji, et al.
Published: (2025)
by: Yang, Hongji, et al.
Published: (2025)
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
by: Hu, Xiwei, et al.
Published: (2024)
by: Hu, Xiwei, et al.
Published: (2024)
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
by: Zhang, Hao, et al.
Published: (2024)
by: Zhang, Hao, et al.
Published: (2024)
One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks
by: Guo, Ji, et al.
Published: (2024)
by: Guo, Ji, et al.
Published: (2024)
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
by: Han, Yucheng, et al.
Published: (2024)
by: Han, Yucheng, et al.
Published: (2024)
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
by: Cheng, Silin, et al.
Published: (2025)
by: Cheng, Silin, et al.
Published: (2025)
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
by: Xu, Katherine, et al.
Published: (2024)
by: Xu, Katherine, et al.
Published: (2024)
Multi-Modal Prompt Learning on Blind Image Quality Assessment
by: Pan, Wensheng, et al.
Published: (2024)
by: Pan, Wensheng, et al.
Published: (2024)
TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
by: Jo, Sanghyun, et al.
Published: (2025)
by: Jo, Sanghyun, et al.
Published: (2025)
Pretrained Image-Text Models are Secretly Video Captioners
by: Zhang, Chunhui, et al.
Published: (2025)
by: Zhang, Chunhui, et al.
Published: (2025)
TextCraftor: Your Text Encoder Can be Image Quality Controller
by: Li, Yanyu, et al.
Published: (2024)
by: Li, Yanyu, et al.
Published: (2024)
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
by: Liu, Renyang, et al.
Published: (2025)
by: Liu, Renyang, et al.
Published: (2025)
The CLIP Model is Secretly an Image-to-Prompt Converter
by: Ding, Yuxuan, et al.
Published: (2023)
by: Ding, Yuxuan, et al.
Published: (2023)
CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)
by: Ma, Lichen, et al.
Published: (2024)
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
by: Yang, Ruohong, et al.
Published: (2024)
by: Yang, Ruohong, et al.
Published: (2024)
Your Diffusion Model is Secretly a Certifiably Robust Classifier
by: Chen, Huanran, et al.
Published: (2024)
by: Chen, Huanran, et al.
Published: (2024)
Your Pre-trained Diffusion Model Secretly Knows Restoration
by: Rajagopalan, Sudarshan, et al.
Published: (2026)
by: Rajagopalan, Sudarshan, et al.
Published: (2026)
Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image
by: Wang, Yuxuan, et al.
Published: (2025)
by: Wang, Yuxuan, et al.
Published: (2025)
Disciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generation
by: Zhang, Chi, et al.
Published: (2026)
by: Zhang, Chi, et al.
Published: (2026)
Image Super-Resolution with Text Prompt Diffusion
by: Chen, Zheng, et al.
Published: (2023)
by: Chen, Zheng, et al.
Published: (2023)
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
by: Zhu, Beier, et al.
Published: (2023)
by: Zhu, Beier, et al.
Published: (2023)
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
by: Fei, Hao, et al.
Published: (2023)
by: Fei, Hao, et al.
Published: (2023)
Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
by: Xia, Ruihao, et al.
Published: (2024)
by: Xia, Ruihao, et al.
Published: (2024)
Adaptively Clustering Neighbor Elements for Image-Text Generation
by: Wang, Zihua, et al.
Published: (2023)
by: Wang, Zihua, et al.
Published: (2023)
Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated
by: Yang, Muli, et al.
Published: (2026)
by: Yang, Muli, et al.
Published: (2026)
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
by: Chen, Xinyan, et al.
Published: (2023)
by: Chen, Xinyan, et al.
Published: (2023)
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
by: Zhu, Beier, et al.
Published: (2025)
by: Zhu, Beier, et al.
Published: (2025)
EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models
by: Li, Mingzhe, et al.
Published: (2025)
by: Li, Mingzhe, et al.
Published: (2025)
Similar Items
-
Prompt-aligned Gradient for Prompt Tuning
by: Zhu, Beier, et al.
Published: (2022) -
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025) -
Dual-Modal Prompting for Sketch-Based Image Retrieval
by: Gao, Liying, et al.
Published: (2024) -
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
by: Ghazanfari, Sara, et al.
Published: (2024) -
PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching
by: Nie, Han, et al.
Published: (2025)