Saved in:
| Main Author: | Mukhopadhyay, Snehasis |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.27545 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
by: Mukhopadhyay, Snehasis
Published: (2026)
by: Mukhopadhyay, Snehasis
Published: (2026)
Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
Who Benefits From Sinus Surgery? Comparing Generative AI and Supervised Machine Learning for Predicting Surgical Outcomes in Chronic Rhinosinusitis
by: Chowdhury, Sayeed Shafayet, et al.
Published: (2026)
by: Chowdhury, Sayeed Shafayet, et al.
Published: (2026)
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
by: Feng, Mingqian, et al.
Published: (2026)
by: Feng, Mingqian, et al.
Published: (2026)
DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
by: Lucas, Jason, et al.
Published: (2026)
by: Lucas, Jason, et al.
Published: (2026)
Quantification of Tenseness in English and Japanese Tense-Lax Vowels: A Lagrangian Model with Indicator θ1 and Force of Tenseness Ftense(t)
by: Ishizaki, Tatsuya
Published: (2025)
by: Ishizaki, Tatsuya
Published: (2025)
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
by: Takemoto, Kazuhiro
Published: (2024)
by: Takemoto, Kazuhiro
Published: (2024)
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
by: Liu, Xuannan, et al.
Published: (2024)
by: Liu, Xuannan, et al.
Published: (2024)
Jailbreaking Attack against Multimodal Large Language Model
by: Niu, Zhenxing, et al.
Published: (2024)
by: Niu, Zhenxing, et al.
Published: (2024)
AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)
Enhancing Jailbreak Attacks with Diversity Guidance
by: Zhang, Xu, et al.
Published: (2024)
by: Zhang, Xu, et al.
Published: (2024)
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
by: You, Wenhao, et al.
Published: (2025)
by: You, Wenhao, et al.
Published: (2025)
Efficient LLM-Jailbreaking via Multimodal-LLM Jailbreak
by: Ji, Haoxuan, et al.
Published: (2024)
by: Ji, Haoxuan, et al.
Published: (2024)
Jailbreaking Large Language Models with Morality Attacks
by: Su, Ying, et al.
Published: (2026)
by: Su, Ying, et al.
Published: (2026)
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
by: Luo, Xuan, et al.
Published: (2025)
by: Luo, Xuan, et al.
Published: (2025)
The Reality of Tense
by: Richter, Stefanie
Published: (2018)
by: Richter, Stefanie
Published: (2018)
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
by: Feng, Yingchaojie, et al.
Published: (2024)
by: Feng, Yingchaojie, et al.
Published: (2024)
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
by: Noughabi, Havva Alizadeh, et al.
Published: (2025)
by: Noughabi, Havva Alizadeh, et al.
Published: (2025)
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
by: Lv, Lijia, et al.
Published: (2024)
by: Lv, Lijia, et al.
Published: (2024)
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
by: Lin, Yuping, et al.
Published: (2024)
by: Lin, Yuping, et al.
Published: (2024)
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
by: Xu, Zhi, et al.
Published: (2026)
by: Xu, Zhi, et al.
Published: (2026)
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
by: Wei, Zhipeng, et al.
Published: (2024)
by: Wei, Zhipeng, et al.
Published: (2024)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
PAST: Phonetic-Acoustic Speech Tokenizer
by: Har-Tuv, Nadav, et al.
Published: (2025)
by: Har-Tuv, Nadav, et al.
Published: (2025)
Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs
by: Zhou, Yao, et al.
Published: (2026)
by: Zhou, Yao, et al.
Published: (2026)
Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models
by: Sun, Xiaobing, et al.
Published: (2026)
by: Sun, Xiaobing, et al.
Published: (2026)
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
by: Wang, Zijun, et al.
Published: (2024)
by: Wang, Zijun, et al.
Published: (2024)
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
by: Meng, Wenlong, et al.
Published: (2025)
by: Meng, Wenlong, et al.
Published: (2025)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
by: Oh, Sejoon, et al.
Published: (2024)
by: Oh, Sejoon, et al.
Published: (2024)
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
by: Wong, Aidan, et al.
Published: (2024)
by: Wong, Aidan, et al.
Published: (2024)
Merging Improves Self-Critique Against Jailbreak Attacks
by: Gallego, Victor
Published: (2024)
by: Gallego, Victor
Published: (2024)
Defending LLMs against Jailbreaking Attacks via Backtranslation
by: Wang, Yihan, et al.
Published: (2024)
by: Wang, Yihan, et al.
Published: (2024)
Languages Without Tense
by: Maziar Toosarvandani
Published: (2025)
by: Maziar Toosarvandani
Published: (2025)
Boosting Jailbreak Attack with Momentum
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers
by: Lin, Liang, et al.
Published: (2025)
by: Lin, Liang, et al.
Published: (2025)
Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
by: Miao, Ziqi, et al.
Published: (2025)
by: Miao, Ziqi, et al.
Published: (2025)
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
by: Guo, Xingang, et al.
Published: (2024)
by: Guo, Xingang, et al.
Published: (2024)
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)
by: Xu, Zhao, et al.
Published: (2024)
Similar Items
-
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
by: Mukhopadhyay, Snehasis
Published: (2026) -
Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024) -
Who Benefits From Sinus Surgery? Comparing Generative AI and Supervised Machine Learning for Predicting Surgical Outcomes in Chronic Rhinosinusitis
by: Chowdhury, Sayeed Shafayet, et al.
Published: (2026) -
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
by: Feng, Mingqian, et al.
Published: (2026) -
DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
by: Lucas, Jason, et al.
Published: (2026)