Saved in:
Bibliographic Details
Main Authors: Kimura, Subaru, Tanaka, Ryota, Miyawaki, Shumpei, Suzuki, Jun, Sakaguchi, Keisuke
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.03554
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916349743202304
author Kimura, Subaru
Tanaka, Ryota
Miyawaki, Shumpei
Suzuki, Jun
Sakaguchi, Keisuke
author_facet Kimura, Subaru
Tanaka, Ryota
Miyawaki, Shumpei
Suzuki, Jun
Sakaguchi, Keisuke
contents We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.
format Preprint
id arxiv_https___arxiv_org_abs_2408_03554
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
Kimura, Subaru
Tanaka, Ryota
Miyawaki, Shumpei
Suzuki, Jun
Sakaguchi, Keisuke
Computation and Language
Cryptography and Security
Machine Learning
We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.
title Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
topic Computation and Language
Cryptography and Security
Machine Learning
url https://arxiv.org/abs/2408.03554