Saved in:
| Main Author: | Yang, Fan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.10032 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
by: Zhou, Weikang, et al.
Published: (2024)
by: Zhou, Weikang, et al.
Published: (2024)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)
by: Yu, Miao, et al.
Published: (2024)
Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024)
by: Xiao, Zeguan, et al.
Published: (2024)
CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)
by: Zhou, Guanghao, et al.
Published: (2025)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models
by: Lee, Isack, et al.
Published: (2024)
by: Lee, Isack, et al.
Published: (2024)
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
by: Sun, Xiongtao, et al.
Published: (2024)
by: Sun, Xiongtao, et al.
Published: (2024)
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
by: Mu, Honglin, et al.
Published: (2024)
by: Mu, Honglin, et al.
Published: (2024)
When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark
by: Ghoshal, Subha, et al.
Published: (2026)
by: Ghoshal, Subha, et al.
Published: (2026)
Revisiting Jailbreaking for Large Language Models: A Representation Engineering Perspective
by: Li, Tianlong, et al.
Published: (2024)
by: Li, Tianlong, et al.
Published: (2024)
The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models
by: Huang, Linghan, et al.
Published: (2025)
by: Huang, Linghan, et al.
Published: (2025)
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)
by: Ran, Delong, et al.
Published: (2024)
Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing
by: Zhao, Yinzhi, et al.
Published: (2026)
by: Zhao, Yinzhi, et al.
Published: (2026)
Jailbreaking Large Language Models Through Content Concretization
by: Wahréus, Johan, et al.
Published: (2025)
by: Wahréus, Johan, et al.
Published: (2025)
Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology
by: Wang, Zhenhua, et al.
Published: (2024)
by: Wang, Zhenhua, et al.
Published: (2024)
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
by: Liu, Xiaogeng, et al.
Published: (2023)
by: Liu, Xiaogeng, et al.
Published: (2023)
Round Trip Translation Defence against Large Language Model Jailbreaking Attacks
by: Yung, Canaan, et al.
Published: (2024)
by: Yung, Canaan, et al.
Published: (2024)
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
by: Zhao, Shiji, et al.
Published: (2025)
by: Zhao, Shiji, et al.
Published: (2025)
Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models
by: Ke, Shih-Wen, et al.
Published: (2025)
by: Ke, Shih-Wen, et al.
Published: (2025)
Single-pass Detection of Jailbreaking Input in Large Language Models
by: Candogan, Leyla Naz, et al.
Published: (2025)
by: Candogan, Leyla Naz, et al.
Published: (2025)
Is the System Message Really Important to Jailbreaks in Large Language Models?
by: Zou, Xiaotian, et al.
Published: (2024)
by: Zou, Xiaotian, et al.
Published: (2024)
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
by: Bisconti, Piercosma, et al.
Published: (2025)
by: Bisconti, Piercosma, et al.
Published: (2025)
MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning
by: Zheng, Muyang, et al.
Published: (2025)
by: Zheng, Muyang, et al.
Published: (2025)
DynamicMind: A Tri-Mode Thinking System for Large Language Models
by: Li, Wei, et al.
Published: (2025)
by: Li, Wei, et al.
Published: (2025)
When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
by: Shamsi, Zafir, et al.
Published: (2026)
by: Shamsi, Zafir, et al.
Published: (2026)
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
by: Cao, Hongye, et al.
Published: (2025)
by: Cao, Hongye, et al.
Published: (2025)
Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think Slow
by: Du, Y., et al.
Published: (2025)
by: Du, Y., et al.
Published: (2025)
Jailbreaking Large Language Models with Symbolic Mathematics
by: Bethany, Emet, et al.
Published: (2024)
by: Bethany, Emet, et al.
Published: (2024)
Cequel: Cost-Effective Querying of Large Language Models for Text Clustering
by: Wang, Hongtao, et al.
Published: (2025)
by: Wang, Hongtao, et al.
Published: (2025)
THiNK: Can Large Language Models Think-aloud?
by: Yu, Yongan, et al.
Published: (2025)
by: Yu, Yongan, et al.
Published: (2025)
Multi-Persona Thinking for Bias Mitigation in Large Language Models
by: Chen, Yuxing, et al.
Published: (2026)
by: Chen, Yuxing, et al.
Published: (2026)
Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
by: Elenjical, Abraham Paul, et al.
Published: (2026)
by: Elenjical, Abraham Paul, et al.
Published: (2026)
Missed Connections: Lateral Thinking Puzzles for Large Language Models
by: Todd, Graham, et al.
Published: (2024)
by: Todd, Graham, et al.
Published: (2024)
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
by: Yan, Yuchen, et al.
Published: (2025)
by: Yan, Yuchen, et al.
Published: (2025)
Jailbreaking to Jailbreak
by: Kritz, Jeremy, et al.
Published: (2025)
by: Kritz, Jeremy, et al.
Published: (2025)
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)
by: Cheng, Siyang, et al.
Published: (2025)
Similar Items
-
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
by: Zhou, Weikang, et al.
Published: (2024) -
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026) -
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024) -
Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024) -
CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)