Tirohanga kaimahi: :: Library Catalog

I tiakina i:

Ngā taipitopito rārangi puna kōrero
Ngā kaituhi matua:	Li, Juncheng, Li, Yige, Huang, Hanxun, Chen, Yunhao, Wang, Xin, Wang, Yixu, Ma, Xingjun, Jiang, Yu-Gang
Hōputu:	Preprint
I whakaputaina:	2025
Ngā marau:	Computer Vision and Pattern Recognition
Urunga tuihono:	https://arxiv.org/abs/2511.18921
Ngā Tūtohu:	Tāpirihia he Tūtohu Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

_version_	1866915634960400384
author	Li, Juncheng Li, Yige Huang, Hanxun Chen, Yunhao Wang, Xin Wang, Yixu Ma, Xingjun Jiang, Yu-Gang
author_facet	Li, Juncheng Li, Yige Huang, Hanxun Chen, Yunhao Wang, Xin Wang, Yixu Ma, Xingjun Jiang, Yu-Gang
contents	Backdoor attacks undermine the reliability and trustworthiness of machine learning systems by injecting hidden behaviors that can be maliciously activated at inference time. While such threats have been extensively studied in unimodal settings, their impact on multimodal foundation models, particularly vision-language models (VLMs), remains largely underexplored. In this work, we introduce \textbf{BackdoorVLM}, the first comprehensive benchmark for systematically evaluating backdoor attacks on VLMs across a broad range of settings. It adopts a unified perspective that injects and analyzes backdoors across core vision-language tasks, including image captioning and visual question answering. BackdoorVLM organizes multimodal backdoor threats into 5 representative categories: targeted refusal, malicious injection, jailbreak, concept substitution, and perceptual hijack. Each category captures a distinct pathway through which an adversary can manipulate a model's behavior. We evaluate these threats using 12 representative attack methods spanning text, image, and bimodal triggers, tested on 2 open-source VLMs and 3 multimodal datasets. Our analysis reveals that VLMs exhibit strong sensitivity to textual instructions, and in bimodal backdoors the text trigger typically overwhelms the image trigger when forming the backdoor mapping. Notably, backdoors involving the textual modality remain highly potent, with poisoning rates as low as 1\% yielding over 90\% success across most tasks. These findings highlight significant, previously underexplored vulnerabilities in current VLMs. We hope that BackdoorVLM can serve as a useful benchmark for analyzing and mitigating multimodal backdoor threats. Code is available at: https://github.com/bin015/BackdoorVLM .
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_18921
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models Li, Juncheng Li, Yige Huang, Hanxun Chen, Yunhao Wang, Xin Wang, Yixu Ma, Xingjun Jiang, Yu-Gang Computer Vision and Pattern Recognition Backdoor attacks undermine the reliability and trustworthiness of machine learning systems by injecting hidden behaviors that can be maliciously activated at inference time. While such threats have been extensively studied in unimodal settings, their impact on multimodal foundation models, particularly vision-language models (VLMs), remains largely underexplored. In this work, we introduce \textbf{BackdoorVLM}, the first comprehensive benchmark for systematically evaluating backdoor attacks on VLMs across a broad range of settings. It adopts a unified perspective that injects and analyzes backdoors across core vision-language tasks, including image captioning and visual question answering. BackdoorVLM organizes multimodal backdoor threats into 5 representative categories: targeted refusal, malicious injection, jailbreak, concept substitution, and perceptual hijack. Each category captures a distinct pathway through which an adversary can manipulate a model's behavior. We evaluate these threats using 12 representative attack methods spanning text, image, and bimodal triggers, tested on 2 open-source VLMs and 3 multimodal datasets. Our analysis reveals that VLMs exhibit strong sensitivity to textual instructions, and in bimodal backdoors the text trigger typically overwhelms the image trigger when forming the backdoor mapping. Notably, backdoors involving the textual modality remain highly potent, with poisoning rates as low as 1\% yielding over 90\% success across most tasks. These findings highlight significant, previously underexplored vulnerabilities in current VLMs. We hope that BackdoorVLM can serve as a useful benchmark for analyzing and mitigating multimodal backdoor threats. Code is available at: https://github.com/bin015/BackdoorVLM .
title	BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2511.18921

Ngā tūemi rite