Saved in:
| Main Authors: | Jiang, Yukun, Li, Mingjie, Backes, Michael, Zhang, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.21189 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs
by: Jiang, Yukun, et al.
Published: (2026)
by: Jiang, Yukun, et al.
Published: (2026)
$\texttt{ModSCAN}$: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities
by: Jiang, Yukun, et al.
Published: (2024)
by: Jiang, Yukun, et al.
Published: (2024)
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
by: Shen, Xinyue, et al.
Published: (2023)
by: Shen, Xinyue, et al.
Published: (2023)
JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring
by: Chu, Junjie, et al.
Published: (2025)
by: Chu, Junjie, et al.
Published: (2025)
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
by: Akkus, Atilla, et al.
Published: (2024)
by: Akkus, Atilla, et al.
Published: (2024)
Voice Jailbreak Attacks Against GPT-4o
by: Shen, Xinyue, et al.
Published: (2024)
by: Shen, Xinyue, et al.
Published: (2024)
Peering Behind the Shield: Guardrail Identification in Large Language Models
by: Yang, Ziqing, et al.
Published: (2025)
by: Yang, Ziqing, et al.
Published: (2025)
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
by: Jiang, Yukun, et al.
Published: (2026)
by: Jiang, Yukun, et al.
Published: (2026)
Excessive Reasoning Attack on Reasoning LLMs
by: Si, Wai Man, et al.
Published: (2025)
by: Si, Wai Man, et al.
Published: (2025)
Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models
by: Chen, Zeyuan, et al.
Published: (2026)
by: Chen, Zeyuan, et al.
Published: (2026)
Real Money, Fake Models: Deceptive Model Claims in Shadow APIs
by: Zhang, Yage, et al.
Published: (2026)
by: Zhang, Yage, et al.
Published: (2026)
Watermarking LLM-Generated Datasets in Downstream Tasks
by: Liu, Yugeng, et al.
Published: (2025)
by: Liu, Yugeng, et al.
Published: (2025)
From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection
by: Liang, Mengfei, et al.
Published: (2025)
by: Liang, Mengfei, et al.
Published: (2025)
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
by: Liu, Yugeng, et al.
Published: (2023)
by: Liu, Yugeng, et al.
Published: (2023)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
by: Deng, Gelei, et al.
Published: (2023)
by: Deng, Gelei, et al.
Published: (2023)
"Humans welcome to observe": A First Look at the Agent Social Network Moltbook
by: Jiang, Yukun, et al.
Published: (2026)
by: Jiang, Yukun, et al.
Published: (2026)
Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities
by: Qu, Yiting, et al.
Published: (2025)
by: Qu, Yiting, et al.
Published: (2025)
The Challenge of Identifying the Origin of Black-Box Large Language Models
by: Yang, Ziqing, et al.
Published: (2025)
by: Yang, Ziqing, et al.
Published: (2025)
SOS! Soft Prompt Attack Against Open-Source Large Language Models
by: Yang, Ziqing, et al.
Published: (2024)
by: Yang, Ziqing, et al.
Published: (2024)
Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models
by: Dong, Yiting, et al.
Published: (2024)
by: Dong, Yiting, et al.
Published: (2024)
Mitigating Jailbreaks with Intent-Aware LLMs
by: Yeo, Wei Jie, et al.
Published: (2025)
by: Yeo, Wei Jie, et al.
Published: (2025)
Composite Backdoor Attacks Against Large Language Models
by: Huang, Hai, et al.
Published: (2023)
by: Huang, Hai, et al.
Published: (2023)
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
by: Shang, Shang, et al.
Published: (2024)
by: Shang, Shang, et al.
Published: (2024)
Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models
by: Zou, Quanchen, et al.
Published: (2026)
by: Zou, Quanchen, et al.
Published: (2026)
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
by: Zhang, Deyue, et al.
Published: (2025)
by: Zhang, Deyue, et al.
Published: (2025)
SAID: Safety-Aware Intent Defense via Prefix Probing for Large Language Models
by: Chen, Yulong, et al.
Published: (2025)
by: Chen, Yulong, et al.
Published: (2025)
Efficient Data-Free Model Stealing with Label Diversity
by: Liu, Yiyong, et al.
Published: (2024)
by: Liu, Yiyong, et al.
Published: (2024)
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense
by: Hao, Shuyang, et al.
Published: (2025)
by: Hao, Shuyang, et al.
Published: (2025)
A Cross-Language Investigation into Jailbreak Attacks in Large Language Models
by: Li, Jie, et al.
Published: (2024)
by: Li, Jie, et al.
Published: (2024)
Jailbreaking Large Language Models in Infinitely Many Ways
by: Goldstein, Oliver, et al.
Published: (2025)
by: Goldstein, Oliver, et al.
Published: (2025)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
by: Chao, Patrick, et al.
Published: (2024)
by: Chao, Patrick, et al.
Published: (2024)
TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis
by: Wang, Longtian, et al.
Published: (2025)
by: Wang, Longtian, et al.
Published: (2025)
Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing
by: Lintelo, Jona te, et al.
Published: (2026)
by: Lintelo, Jona te, et al.
Published: (2026)
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
by: Li, Xuan, et al.
Published: (2023)
by: Li, Xuan, et al.
Published: (2023)
Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks
by: Zhang, Minxing, et al.
Published: (2024)
by: Zhang, Minxing, et al.
Published: (2024)
PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization
by: Liu, Aofan, et al.
Published: (2025)
by: Liu, Aofan, et al.
Published: (2025)
Model-Editing-Based Jailbreak against Safety-aligned Large Language Models
by: Li, Yuxi, et al.
Published: (2024)
by: Li, Yuxi, et al.
Published: (2024)
Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning
by: Wang, Zhaoqi, et al.
Published: (2026)
by: Wang, Zhaoqi, et al.
Published: (2026)
Similar Items
-
Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs
by: Jiang, Yukun, et al.
Published: (2026) -
$\texttt{ModSCAN}$: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities
by: Jiang, Yukun, et al.
Published: (2024) -
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
by: Shen, Xinyue, et al.
Published: (2023) -
JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring
by: Chu, Junjie, et al.
Published: (2025) -
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
by: Akkus, Atilla, et al.
Published: (2024)