:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Fengyuan, Luo, Haochen, Li, Yiming, Torr, Philip, Gu, Jindong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.02697
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
by: Luo, Haochen, et al.
Published: (2024)

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
by: Li, Hang, et al.
Published: (2023)

Influencer Backdoor Attack on Semantic Segmentation
by: Lan, Haoheng, et al.
Published: (2023)

Attribution as Retrieval: Model-Agnostic AI-Generated Image Attribution
by: Wang, Hongsong, et al.
Published: (2026)

Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
by: Gao, Kuofeng, et al.
Published: (2024)

Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models
by: Xing, Songlong, et al.
Published: (2026)

Improving Adversarial Transferability via Model Alignment
by: Ma, Avery, et al.
Published: (2023)

As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
by: Hu, Anjun, et al.
Published: (2024)

PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
by: Wu, Yixuan, et al.
Published: (2025)

Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
by: Gao, Kuofeng, et al.
Published: (2024)

Latent Guard: a Safety Framework for Text-to-image Generation
by: Liu, Runtao, et al.
Published: (2024)

Layout Agnostic Scene Text Image Synthesis with Diffusion Models
by: Zhangli, Qilong, et al.
Published: (2024)

Learning Visual Prompts for Guiding the Attention of Vision Transformers
by: Rezaei, Razieh, et al.
Published: (2024)

Detecting Origin Attribution for Text-to-Image Diffusion Models
by: Xu, Katherine, et al.
Published: (2024)

AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
by: Liu, Runtao, et al.
Published: (2024)

Evaluating Attribute Confusion in Fashion Text-to-Image Generation
by: Liu, Ziyue, et al.
Published: (2025)

A Survey on Responsible Generative AI: What to Generate and What Not
by: Gu, Jindong
Published: (2024)

Mitigating Bias Using Model-Agnostic Data Attribution
by: De Coninck, Sander, et al.
Published: (2024)

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks
by: Xu, Jingning, et al.
Published: (2026)

Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?
by: Zhang, Likun, et al.
Published: (2024)

Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
by: Chen, Shuo, et al.
Published: (2023)

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
by: Han, Junlin, et al.
Published: (2024)

True Multimodal In-Context Learning Needs Attention to the Visual Context
by: Chen, Shuo, et al.
Published: (2025)

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
by: Yuan, Jianhao, et al.
Published: (2022)

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
by: Shi, Fengyuan, et al.
Published: (2023)

Multimodal Pragmatic Jailbreak on Text-to-image Models
by: Liu, Tong, et al.
Published: (2024)

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
by: He, Zijian, et al.
Published: (2024)

CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
by: Guo, Mingyi, et al.
Published: (2024)

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
by: Li, Runjia, et al.
Published: (2025)

A Survey on Transferability of Adversarial Examples across Deep Neural Networks
by: Gu, Jindong, et al.
Published: (2023)

Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models
by: Cheng, Hao, et al.
Published: (2024)

Learnable Sparsity for Vision Generative Models
by: Zhang, Yang, et al.
Published: (2024)

Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
by: Cheng, Hao, et al.
Published: (2024)

Learning a General Model: Folding Clothing with Topological Dynamics
by: Liu, Yiming, et al.
Published: (2025)

KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
by: Yang, Fengyuan, et al.
Published: (2024)

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
by: Liu, Xin, et al.
Published: (2023)

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
by: Chen, Yiming, et al.
Published: (2025)

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
by: Chen, Haokun, et al.
Published: (2024)

Localizing Events in Videos with Multimodal Queries
by: Zhang, Gengyuan, et al.
Published: (2024)

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
by: Li, Xiang, et al.
Published: (2024)