:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lin, Liang, Xu, Zhihao, Tang, Xuehai, Liu, Shi, Zhou, Biyu, Zhu, Fuqing, Han, Jizhong, Hu, Songlin
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2507.13474
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
von: Yang, Xikang, et al.
Veröffentlicht: (2025)

LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing
von: Wang, Peng, et al.
Veröffentlicht: (2025)

FABLE: Fine-grained Fact Anchoring for Unstructured Model Editing
von: Wang, Peng, et al.
Veröffentlicht: (2026)

Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
von: Yang, Xikang, et al.
Veröffentlicht: (2024)

AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
von: Lv, Lijia, et al.
Veröffentlicht: (2024)

Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens
von: Yang, Xikang, et al.
Veröffentlicht: (2024)

RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents
von: Xiao, Wenjie, et al.
Veröffentlicht: (2026)

The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
von: Yang, Xikang, et al.
Veröffentlicht: (2024)

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents
von: Liu, Shi, et al.
Veröffentlicht: (2026)

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
von: Lin, Yuping, et al.
Veröffentlicht: (2024)

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
von: Fang, Zhicheng, et al.
Veröffentlicht: (2026)

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills
von: Lv, Lijia, et al.
Veröffentlicht: (2026)

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
von: Xu, Zhao, et al.
Veröffentlicht: (2024)

CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
von: Liu, Zhihao, et al.
Veröffentlicht: (2024)

Navigating Through Paper Flood: Advancing LLM-based Paper Evaluation through Domain-Aware Retrieval and Latent Reasoning
von: Zheng, Wuqiang, et al.
Veröffentlicht: (2025)

Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
von: Liu, Fan, et al.
Veröffentlicht: (2024)

Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
von: Meng, Wenlong, et al.
Veröffentlicht: (2025)

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
von: Mu, Honglin, et al.
Veröffentlicht: (2024)

LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
von: Lin, Shi, et al.
Veröffentlicht: (2024)

PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
von: Shen, Guobin, et al.
Veröffentlicht: (2025)

Defending LLMs against Jailbreaking Attacks via Backtranslation
von: Wang, Yihan, et al.
Veröffentlicht: (2024)

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
von: Guo, Xingang, et al.
Veröffentlicht: (2024)

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs
von: Zhou, Yao, et al.
Veröffentlicht: (2026)

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
von: Chu, Junjie, et al.
Veröffentlicht: (2024)

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
von: Zhou, Zhenhong, et al.
Veröffentlicht: (2024)

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
von: Lin, Guanyu, et al.
Veröffentlicht: (2024)

Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks
von: Liu, Sheng, et al.
Veröffentlicht: (2025)

DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
von: Li, Yu, et al.
Veröffentlicht: (2025)

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
von: Xu, Zhangchen, et al.
Veröffentlicht: (2024)

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
von: Zeng, Yi, et al.
Veröffentlicht: (2024)

TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
von: Xu, Zhi, et al.
Veröffentlicht: (2026)

Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
von: Wei, Zhipeng, et al.
Veröffentlicht: (2024)

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
von: Wu, Yutao, et al.
Veröffentlicht: (2025)

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
von: Wang, Siyuan, et al.
Veröffentlicht: (2024)

PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant
von: Yin, Congrui, et al.
Veröffentlicht: (2025)

ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings
von: Wang, Hao, et al.
Veröffentlicht: (2024)

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
von: Noughabi, Havva Alizadeh, et al.
Veröffentlicht: (2025)

CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
von: Ou, Jiefu, et al.
Veröffentlicht: (2025)

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
von: Chen, Kexin, et al.
Veröffentlicht: (2024)

Stop DDoS Attacking the Research Community with AI-Generated Survey Papers
von: Lin, Jianghao, et al.
Veröffentlicht: (2025)