:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Mukhopadhyay, Snehasis
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2605.27545
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
by: Mukhopadhyay, Snehasis
Published: (2026)

Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)

Who Benefits From Sinus Surgery? Comparing Generative AI and Supervised Machine Learning for Predicting Surgical Outcomes in Chronic Rhinosinusitis
by: Chowdhury, Sayeed Shafayet, et al.
Published: (2026)

SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
by: Feng, Mingqian, et al.
Published: (2026)

DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
by: Lucas, Jason, et al.
Published: (2026)

Quantification of Tenseness in English and Japanese Tense-Lax Vowels: A Lagrangian Model with Indicator θ1 and Force of Tenseness Ftense(t)
by: Ishizaki, Tatsuya
Published: (2025)

All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
by: Takemoto, Kazuhiro
Published: (2024)

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
by: Liu, Xuannan, et al.
Published: (2024)

Jailbreaking Attack against Multimodal Large Language Model
by: Niu, Zhenxing, et al.
Published: (2024)

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)

Enhancing Jailbreak Attacks with Diversity Guidance
by: Zhang, Xu, et al.
Published: (2024)

MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
by: You, Wenhao, et al.
Published: (2025)

Efficient LLM-Jailbreaking via Multimodal-LLM Jailbreak
by: Ji, Haoxuan, et al.
Published: (2024)

Jailbreaking Large Language Models with Morality Attacks
by: Su, Ying, et al.
Published: (2026)

A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
by: Luo, Xuan, et al.
Published: (2025)

The Reality of Tense
by: Richter, Stefanie
Published: (2018)

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
by: Feng, Yingchaojie, et al.
Published: (2024)

AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
by: Shu, Dong, et al.
Published: (2024)

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
by: Noughabi, Havva Alizadeh, et al.
Published: (2025)

AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
by: Lv, Lijia, et al.
Published: (2024)

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
by: Lin, Yuping, et al.
Published: (2024)

TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
by: Xu, Zhi, et al.
Published: (2026)

Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
by: Wei, Zhipeng, et al.
Published: (2024)

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)

PAST: Phonetic-Acoustic Speech Tokenizer
by: Har-Tuv, Nadav, et al.
Published: (2025)

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs
by: Zhou, Yao, et al.
Published: (2026)

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models
by: Sun, Xiaobing, et al.
Published: (2026)

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
by: Wang, Zijun, et al.
Published: (2024)

Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
by: Meng, Wenlong, et al.
Published: (2025)

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)

UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
by: Oh, Sejoon, et al.
Published: (2024)

SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
by: Wong, Aidan, et al.
Published: (2024)

Merging Improves Self-Critique Against Jailbreak Attacks
by: Gallego, Victor
Published: (2024)

Defending LLMs against Jailbreaking Attacks via Backtranslation
by: Wang, Yihan, et al.
Published: (2024)

Languages Without Tense
by: Maziar Toosarvandani
Published: (2025)

Boosting Jailbreak Attack with Momentum
by: Zhang, Yihao, et al.
Published: (2024)

Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers
by: Lin, Liang, et al.
Published: (2025)

Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
by: Miao, Ziqi, et al.
Published: (2025)

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
by: Guo, Xingang, et al.
Published: (2024)

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)