Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kimura, Subaru, Tanaka, Ryota, Miyawaki, Shumpei, Suzuki, Jun, Sakaguchi, Keisuke
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2408.03554
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916349743202304
author	Kimura, Subaru Tanaka, Ryota Miyawaki, Shumpei Suzuki, Jun Sakaguchi, Keisuke
author_facet	Kimura, Subaru Tanaka, Ryota Miyawaki, Shumpei Suzuki, Jun Sakaguchi, Keisuke
contents	We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_03554
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection Kimura, Subaru Tanaka, Ryota Miyawaki, Shumpei Suzuki, Jun Sakaguchi, Keisuke Computation and Language Cryptography and Security Machine Learning We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.
title	Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
topic	Computation and Language Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2408.03554

Similar Items