Saved in:
| Main Authors: | Liu, Fengyuan, Luo, Haochen, Li, Yiming, Torr, Philip, Gu, Jindong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.02697 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
by: Luo, Haochen, et al.
Published: (2024)
by: Luo, Haochen, et al.
Published: (2024)
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
by: Li, Hang, et al.
Published: (2023)
by: Li, Hang, et al.
Published: (2023)
Influencer Backdoor Attack on Semantic Segmentation
by: Lan, Haoheng, et al.
Published: (2023)
by: Lan, Haoheng, et al.
Published: (2023)
Attribution as Retrieval: Model-Agnostic AI-Generated Image Attribution
by: Wang, Hongsong, et al.
Published: (2026)
by: Wang, Hongsong, et al.
Published: (2026)
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models
by: Xing, Songlong, et al.
Published: (2026)
by: Xing, Songlong, et al.
Published: (2026)
Improving Adversarial Transferability via Model Alignment
by: Ma, Avery, et al.
Published: (2023)
by: Ma, Avery, et al.
Published: (2023)
As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
by: Hu, Anjun, et al.
Published: (2024)
by: Hu, Anjun, et al.
Published: (2024)
PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
by: Wu, Yixuan, et al.
Published: (2025)
by: Wu, Yixuan, et al.
Published: (2025)
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
Latent Guard: a Safety Framework for Text-to-image Generation
by: Liu, Runtao, et al.
Published: (2024)
by: Liu, Runtao, et al.
Published: (2024)
Layout Agnostic Scene Text Image Synthesis with Diffusion Models
by: Zhangli, Qilong, et al.
Published: (2024)
by: Zhangli, Qilong, et al.
Published: (2024)
Learning Visual Prompts for Guiding the Attention of Vision Transformers
by: Rezaei, Razieh, et al.
Published: (2024)
by: Rezaei, Razieh, et al.
Published: (2024)
Detecting Origin Attribution for Text-to-Image Diffusion Models
by: Xu, Katherine, et al.
Published: (2024)
by: Xu, Katherine, et al.
Published: (2024)
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
by: Liu, Runtao, et al.
Published: (2024)
by: Liu, Runtao, et al.
Published: (2024)
Evaluating Attribute Confusion in Fashion Text-to-Image Generation
by: Liu, Ziyue, et al.
Published: (2025)
by: Liu, Ziyue, et al.
Published: (2025)
A Survey on Responsible Generative AI: What to Generate and What Not
by: Gu, Jindong
Published: (2024)
by: Gu, Jindong
Published: (2024)
Mitigating Bias Using Model-Agnostic Data Attribution
by: De Coninck, Sander, et al.
Published: (2024)
by: De Coninck, Sander, et al.
Published: (2024)
PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks
by: Xu, Jingning, et al.
Published: (2026)
by: Xu, Jingning, et al.
Published: (2026)
Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?
by: Zhang, Likun, et al.
Published: (2024)
by: Zhang, Likun, et al.
Published: (2024)
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
by: Chen, Shuo, et al.
Published: (2023)
by: Chen, Shuo, et al.
Published: (2023)
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
by: Han, Junlin, et al.
Published: (2024)
by: Han, Junlin, et al.
Published: (2024)
True Multimodal In-Context Learning Needs Attention to the Visual Context
by: Chen, Shuo, et al.
Published: (2025)
by: Chen, Shuo, et al.
Published: (2025)
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
by: Yuan, Jianhao, et al.
Published: (2022)
by: Yuan, Jianhao, et al.
Published: (2022)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
by: Shi, Fengyuan, et al.
Published: (2023)
by: Shi, Fengyuan, et al.
Published: (2023)
Multimodal Pragmatic Jailbreak on Text-to-image Models
by: Liu, Tong, et al.
Published: (2024)
by: Liu, Tong, et al.
Published: (2024)
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
by: He, Zijian, et al.
Published: (2024)
by: He, Zijian, et al.
Published: (2024)
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
by: Guo, Mingyi, et al.
Published: (2024)
by: Guo, Mingyi, et al.
Published: (2024)
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
by: Li, Runjia, et al.
Published: (2025)
by: Li, Runjia, et al.
Published: (2025)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
by: Gu, Jindong, et al.
Published: (2023)
by: Gu, Jindong, et al.
Published: (2023)
Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models
by: Cheng, Hao, et al.
Published: (2024)
by: Cheng, Hao, et al.
Published: (2024)
Learnable Sparsity for Vision Generative Models
by: Zhang, Yang, et al.
Published: (2024)
by: Zhang, Yang, et al.
Published: (2024)
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
by: Cheng, Hao, et al.
Published: (2024)
by: Cheng, Hao, et al.
Published: (2024)
Learning a General Model: Folding Clothing with Topological Dynamics
by: Liu, Yiming, et al.
Published: (2025)
by: Liu, Yiming, et al.
Published: (2025)
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
by: Yang, Fengyuan, et al.
Published: (2024)
by: Yang, Fengyuan, et al.
Published: (2024)
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
by: Liu, Xin, et al.
Published: (2023)
by: Liu, Xin, et al.
Published: (2023)
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
by: Chen, Yiming, et al.
Published: (2025)
by: Chen, Yiming, et al.
Published: (2025)
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
by: Chen, Haokun, et al.
Published: (2024)
by: Chen, Haokun, et al.
Published: (2024)
Localizing Events in Videos with Multimodal Queries
by: Zhang, Gengyuan, et al.
Published: (2024)
by: Zhang, Gengyuan, et al.
Published: (2024)
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Similar Items
-
An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
by: Luo, Haochen, et al.
Published: (2024) -
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
by: Li, Hang, et al.
Published: (2023) -
Influencer Backdoor Attack on Semantic Segmentation
by: Lan, Haoheng, et al.
Published: (2023) -
Attribution as Retrieval: Model-Agnostic AI-Generated Image Attribution
by: Wang, Hongsong, et al.
Published: (2026) -
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
by: Gao, Kuofeng, et al.
Published: (2024)