:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Yang, Fan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.10032
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
by: Zhou, Weikang, et al.
Published: (2024)

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)

LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)

Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024)

CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)

BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models
by: Lee, Isack, et al.
Published: (2024)

Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
by: Sun, Xiongtao, et al.
Published: (2024)

Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
by: Zhou, Yue, et al.
Published: (2024)

Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
by: Mu, Honglin, et al.
Published: (2024)

When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark
by: Ghoshal, Subha, et al.
Published: (2026)

Revisiting Jailbreaking for Large Language Models: A Representation Engineering Perspective
by: Li, Tianlong, et al.
Published: (2024)

The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models
by: Huang, Linghan, et al.
Published: (2025)

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing
by: Zhao, Yinzhi, et al.
Published: (2026)

Jailbreaking Large Language Models Through Content Concretization
by: Wahréus, Johan, et al.
Published: (2025)

Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology
by: Wang, Zhenhua, et al.
Published: (2024)

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
by: Liu, Xiaogeng, et al.
Published: (2023)

Round Trip Translation Defence against Large Language Model Jailbreaking Attacks
by: Yung, Canaan, et al.
Published: (2024)

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
by: Zhao, Shiji, et al.
Published: (2025)

Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models
by: Ke, Shih-Wen, et al.
Published: (2025)

Single-pass Detection of Jailbreaking Input in Large Language Models
by: Candogan, Leyla Naz, et al.
Published: (2025)

Is the System Message Really Important to Jailbreaks in Large Language Models?
by: Zou, Xiaotian, et al.
Published: (2024)

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
by: Bisconti, Piercosma, et al.
Published: (2025)

MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning
by: Zheng, Muyang, et al.
Published: (2025)

DynamicMind: A Tri-Mode Thinking System for Large Language Models
by: Li, Wei, et al.
Published: (2025)

When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
by: Shamsi, Zafir, et al.
Published: (2026)

SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
by: Cao, Hongye, et al.
Published: (2025)

Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think Slow
by: Du, Y., et al.
Published: (2025)

Jailbreaking Large Language Models with Symbolic Mathematics
by: Bethany, Emet, et al.
Published: (2024)

Cequel: Cost-Effective Querying of Large Language Models for Text Clustering
by: Wang, Hongtao, et al.
Published: (2025)

THiNK: Can Large Language Models Think-aloud?
by: Yu, Yongan, et al.
Published: (2025)

Multi-Persona Thinking for Bias Mitigation in Large Language Models
by: Chen, Yuxing, et al.
Published: (2026)

Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
by: Elenjical, Abraham Paul, et al.
Published: (2026)

Missed Connections: Lateral Thinking Puzzles for Large Language Models
by: Todd, Graham, et al.
Published: (2024)

Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)

InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
by: Yan, Yuchen, et al.
Published: (2025)

Jailbreaking to Jailbreak
by: Kritz, Jeremy, et al.
Published: (2025)

ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)