Saved in:
| Main Authors: | Sun, Zhiyu, Luo, Minrui, Wang, Yu, Chen, Zhili, He, Tianxing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10134 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering
by: Chen, Baicheng, et al.
Published: (2026)
by: Chen, Baicheng, et al.
Published: (2026)
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
by: Dong, Xiaoning, et al.
Published: (2024)
by: Dong, Xiaoning, et al.
Published: (2024)
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
by: Wang, Yanbo, et al.
Published: (2025)
by: Wang, Yanbo, et al.
Published: (2025)
Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models
by: Yu, Yongcan, et al.
Published: (2025)
by: Yu, Yongcan, et al.
Published: (2025)
The Fire Thief Is Also the Keeper: Balancing Usability and Privacy in Prompts
by: Shen, Zhili, et al.
Published: (2024)
by: Shen, Zhili, et al.
Published: (2024)
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
by: Ruzzetti, Elena Sofia, et al.
Published: (2025)
by: Ruzzetti, Elena Sofia, et al.
Published: (2025)
Resource Consumption Threats in Large Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)
by: Zhang, Yuanhe, et al.
Published: (2026)
DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts
by: Sun, Xiongtao, et al.
Published: (2024)
by: Sun, Xiongtao, et al.
Published: (2024)
Activation-Guided Local Editing for Jailbreaking Attacks
by: Wang, Jiecong, et al.
Published: (2025)
by: Wang, Jiecong, et al.
Published: (2025)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
by: Liu, Yanjiang, et al.
Published: (2025)
by: Liu, Yanjiang, et al.
Published: (2025)
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging
by: Cong, Tianshuo, et al.
Published: (2024)
by: Cong, Tianshuo, et al.
Published: (2024)
SoK: Large Language Model Copyright Auditing via Fingerprinting
by: Shao, Shuo, et al.
Published: (2025)
by: Shao, Shuo, et al.
Published: (2025)
REEF: Representation Encoding Fingerprints for Large Language Models
by: Zhang, Jie, et al.
Published: (2024)
by: Zhang, Jie, et al.
Published: (2024)
Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond
by: Wang, Xuhong, et al.
Published: (2024)
by: Wang, Xuhong, et al.
Published: (2024)
Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models
by: Yu, Ye, et al.
Published: (2026)
by: Yu, Ye, et al.
Published: (2026)
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
by: Yan, Yu, et al.
Published: (2025)
by: Yan, Yu, et al.
Published: (2025)
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models
by: Ran, Delong, et al.
Published: (2025)
by: Ran, Delong, et al.
Published: (2025)
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
by: Xie, Zhixin, et al.
Published: (2025)
by: Xie, Zhixin, et al.
Published: (2025)
Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models
by: Huang, Chung-ju, et al.
Published: (2026)
by: Huang, Chung-ju, et al.
Published: (2026)
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)
by: Yu, Miao, et al.
Published: (2024)
Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024)
by: Xiao, Zeguan, et al.
Published: (2024)
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)
by: Ran, Delong, et al.
Published: (2024)
Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026)
by: Min, Nay Myat, et al.
Published: (2026)
Internal Safety Collapse in Frontier Large Language Models
by: Wu, Yutao, et al.
Published: (2026)
by: Wu, Yutao, et al.
Published: (2026)
Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
by: Zhao, Wei, et al.
Published: (2024)
by: Zhao, Wei, et al.
Published: (2024)
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
by: Zhao, Shiji, et al.
Published: (2025)
by: Zhao, Shiji, et al.
Published: (2025)
PRISON: Unmasking the Criminal Potential of Large Language Models
by: Wu, Xinyi, et al.
Published: (2025)
by: Wu, Xinyi, et al.
Published: (2025)
Text Embedding Inversion Security for Multilingual Language Models
by: Chen, Yiyi, et al.
Published: (2024)
by: Chen, Yiyi, et al.
Published: (2024)
Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage
by: Hoque, Shahinul, et al.
Published: (2026)
by: Hoque, Shahinul, et al.
Published: (2026)
Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models
by: Yi, Sibo, et al.
Published: (2025)
by: Yi, Sibo, et al.
Published: (2025)
FNF: Functional Network Fingerprint for Large Language Models
by: Liu, Yiheng, et al.
Published: (2026)
by: Liu, Yiheng, et al.
Published: (2026)
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)
by: Cheng, Siyang, et al.
Published: (2025)
Is the System Message Really Important to Jailbreaks in Large Language Models?
by: Zou, Xiaotian, et al.
Published: (2024)
by: Zou, Xiaotian, et al.
Published: (2024)
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
by: Gu, Tianle, et al.
Published: (2024)
by: Gu, Tianle, et al.
Published: (2024)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
Efficient Detection of Toxic Prompts in Large Language Models
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
Citation: A Key to Building Responsible and Accountable Large Language Models
by: Huang, Jie, et al.
Published: (2023)
by: Huang, Jie, et al.
Published: (2023)
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
by: Cao, Yuanpu, et al.
Published: (2023)
by: Cao, Yuanpu, et al.
Published: (2023)
Similar Items
-
CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering
by: Chen, Baicheng, et al.
Published: (2026) -
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024) -
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
by: Dong, Xiaoning, et al.
Published: (2024) -
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
by: Wang, Yanbo, et al.
Published: (2025) -
Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models
by: Yu, Yongcan, et al.
Published: (2025)