Saved in:
Bibliographic Details
Main Authors: Choi, In Chong, Zhang, Jiacheng, Liu, Feng, Song, Yiliao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.14399
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Multi-turn jailbreak attacks have proven effective against text-only large language models (LLMs), where malicious content is gradually introduced to bypass safety alignment. However, effectively extending such attacks to large vision-language models (LVLMs) remains underexplored. In this paper, we find that naively incorporating visual inputs can make multi-turn jailbreaks easier to defend against; for example, overly malicious visual content will easily trigger the defense mechanism in safety-aligned LVLMs, resulting in more conservative responses. Based on this finding, we propose multi-turn adaptive prompting attack (MAPA) that 1) at each turn, alternates text-vision attack actions to elicit the most malicious response; and 2) across turns, adjusts the attack trajectory through iterative back-and-forth refinement to gradually amplify response maliciousness. This two-level design enables MAPA to consistently outperform state-of-the-art methods, improving attack success rates by 15-30% on recent benchmarks against LLaVA-v1.6-Mistral-7B, Qwen2.5-VL-7B-Instruct, Llama-3.2-Vision-11B-Instruct and GPT-4o-mini. Our code is available at: https://github.com/thomaschoi143/MAPA.