Saved in:
| Main Authors: | Dong, Jianshuo, Zhang, Ziyuan, Zhang, Qingjie, Zhang, Tianwei, Wang, Hao, Li, Hewu, Li, Qi, Zhang, Chao, Xu, Ke, Qiu, Han |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.19394 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Large Language Models Automate the Refinement of Cellular Network Specifications?
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
Towards Understanding the Cognitive Habits of Large Reasoning Models
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
LeakDojo: Decoding the Leakage Threats of RAG Systems
by: Zhang, Maosen, et al.
Published: (2026)
by: Zhang, Maosen, et al.
Published: (2026)
BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
by: Yan, Xiaobei, et al.
Published: (2025)
by: Yan, Xiaobei, et al.
Published: (2025)
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection
by: Chen, Meng, et al.
Published: (2026)
by: Chen, Meng, et al.
Published: (2026)
DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
by: Li, Boheng, et al.
Published: (2025)
by: Li, Boheng, et al.
Published: (2025)
When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models
by: Ou, Haoran, et al.
Published: (2025)
by: Ou, Haoran, et al.
Published: (2025)
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
by: Li, Pengcheng, et al.
Published: (2026)
by: Li, Pengcheng, et al.
Published: (2026)
Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures
by: Su, Yanghao, et al.
Published: (2026)
by: Su, Yanghao, et al.
Published: (2026)
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users
by: Li, Guanlin, et al.
Published: (2024)
by: Li, Guanlin, et al.
Published: (2024)
On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression
by: Zhang, Xinwei, et al.
Published: (2026)
by: Zhang, Xinwei, et al.
Published: (2026)
PRIVMARK: Private Large Language Models Watermarking with MPC
by: Fargues, Thomas, et al.
Published: (2025)
by: Fargues, Thomas, et al.
Published: (2025)
Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models
by: Xiong, Junjie, et al.
Published: (2025)
by: Xiong, Junjie, et al.
Published: (2025)
SAID: Safety-Aware Intent Defense via Prefix Probing for Large Language Models
by: Chen, Yulong, et al.
Published: (2025)
by: Chen, Yulong, et al.
Published: (2025)
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
by: Zhang, Jingxuan, et al.
Published: (2025)
by: Zhang, Jingxuan, et al.
Published: (2025)
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
by: Liu, Tong, et al.
Published: (2024)
by: Liu, Tong, et al.
Published: (2024)
Large Language Model Watermark Stealing With Mixed Integer Programming
by: Zhang, Zhaoxi, et al.
Published: (2024)
by: Zhang, Zhaoxi, et al.
Published: (2024)
ObfusBFA: A Holistic Approach to Safeguarding DNNs from Different Types of Bit-Flip Attacks
by: Yan, Xiaobei, et al.
Published: (2025)
by: Yan, Xiaobei, et al.
Published: (2025)
SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models
by: Liu, Renyang, et al.
Published: (2026)
by: Liu, Renyang, et al.
Published: (2026)
InferDPT: Privacy-Preserving Inference for Closed-box Large Language Model
by: Tong, Meng, et al.
Published: (2023)
by: Tong, Meng, et al.
Published: (2023)
ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs
by: Yang, Yuchen, et al.
Published: (2024)
by: Yang, Yuchen, et al.
Published: (2024)
Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning
by: Yang, Xianglin, et al.
Published: (2025)
by: Yang, Xianglin, et al.
Published: (2025)
GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models
by: Xu, Zuyao, et al.
Published: (2026)
by: Xu, Zuyao, et al.
Published: (2026)
Safeguarding Large Language Models: A Survey
by: Dong, Yi, et al.
Published: (2024)
by: Dong, Yi, et al.
Published: (2024)
Safety Layers in Aligned Large Language Models: The Key to LLM Security
by: Li, Shen, et al.
Published: (2024)
by: Li, Shen, et al.
Published: (2024)
SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models
by: Zhang, Jiawen, et al.
Published: (2025)
by: Zhang, Jiawen, et al.
Published: (2025)
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
by: Hong, Hanbin, et al.
Published: (2025)
by: Hong, Hanbin, et al.
Published: (2025)
What Makes a Good LLM Agent for Real-world Penetration Testing?
by: Deng, Gelei, et al.
Published: (2026)
by: Deng, Gelei, et al.
Published: (2026)
Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
by: Ma, Jiachen, et al.
Published: (2024)
by: Ma, Jiachen, et al.
Published: (2024)
CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models
by: Zhang, Shuhao, et al.
Published: (2025)
by: Zhang, Shuhao, et al.
Published: (2025)
Membership Inference Attacks on Tokenizers of Large Language Models
by: Tong, Meng, et al.
Published: (2025)
by: Tong, Meng, et al.
Published: (2025)
Fluent: Round-efficient Secure Aggregation for Private Federated Learning
by: Li, Xincheng, et al.
Published: (2024)
by: Li, Xincheng, et al.
Published: (2024)
BadEdit: Backdooring large language models by model editing
by: Li, Yanzhou, et al.
Published: (2024)
by: Li, Yanzhou, et al.
Published: (2024)
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
by: Deng, Gelei, et al.
Published: (2023)
by: Deng, Gelei, et al.
Published: (2023)
IRCopilot: Automated Incident Response with Large Language Models
by: Lin, Xihuan, et al.
Published: (2025)
by: Lin, Xihuan, et al.
Published: (2025)
Masked Language Model Based Textual Adversarial Example Detection
by: Zhang, Xiaomei, et al.
Published: (2023)
by: Zhang, Xiaomei, et al.
Published: (2023)
CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain
by: Wang, Lingxiang, et al.
Published: (2025)
by: Wang, Lingxiang, et al.
Published: (2025)
AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents
by: Zhang, Yixiang, et al.
Published: (2026)
by: Zhang, Yixiang, et al.
Published: (2026)
Similar Items
-
Can Large Language Models Automate the Refinement of Cellular Network Specifications?
by: Dong, Jianshuo, et al.
Published: (2025) -
Towards Understanding the Cognitive Habits of Large Reasoning Models
by: Dong, Jianshuo, et al.
Published: (2025) -
LeakDojo: Decoding the Leakage Threats of RAG Systems
by: Zhang, Maosen, et al.
Published: (2026) -
BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
by: Yan, Xiaobei, et al.
Published: (2025) -
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)