Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Guan-Yan, Cheng, Tzu-Yu, Teng, Ya-Wen, Wanga, Farn, Yeh, Kuo-Hui
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2510.10281
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912669615783936
author	Yang, Guan-Yan Cheng, Tzu-Yu Teng, Ya-Wen Wanga, Farn Yeh, Kuo-Hui
author_facet	Yang, Guan-Yan Cheng, Tzu-Yu Teng, Ya-Wen Wanga, Farn Yeh, Kuo-Hui
contents	The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_10281
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test Yang, Guan-Yan Cheng, Tzu-Yu Teng, Ya-Wen Wanga, Farn Yeh, Kuo-Hui Cryptography and Security Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.
title	ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
topic	Cryptography and Security Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2510.10281

Similar Items