Saved in:
| Main Authors: | Liu, Fuqiang, Jiang, Sicong, Miranda-Moreno, Luis, Choi, Seongjin, Sun, Lijun |
|---|---|
| 格式: | Preprint |
| 出版: |
2024
|
| 主題: | |
| 在線閱讀: | https://arxiv.org/abs/2412.08099 |
| 標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
相似書籍
Adversarial Text Purification: A Large Language Model Approach for Defense
由: Moraffah, Raha, et al.
出版: (2024)
由: Moraffah, Raha, et al.
出版: (2024)
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
由: Xu, Jiashu, et al.
出版: (2023)
由: Xu, Jiashu, et al.
出版: (2023)
The Resurgence of GCG Adversarial Attacks on Large Language Models
由: Tan, Yuting, et al.
出版: (2025)
由: Tan, Yuting, et al.
出版: (2025)
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
由: Zhang, Yihao, et al.
出版: (2024)
由: Zhang, Yihao, et al.
出版: (2024)
RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models
由: Chugh, Rishit
出版: (2026)
由: Chugh, Rishit
出版: (2026)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
由: Yuan, Hongbang, et al.
出版: (2024)
由: Yuan, Hongbang, et al.
出版: (2024)
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
由: Li, Lijun, et al.
出版: (2024)
由: Li, Lijun, et al.
出版: (2024)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
由: Reddy, Aashray, et al.
出版: (2025)
由: Reddy, Aashray, et al.
出版: (2025)
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
由: Xue, Jiaqi, et al.
出版: (2024)
由: Xue, Jiaqi, et al.
出版: (2024)
On Adversarial Robustness of Language Models in Transfer Learning
由: Turbal, Bohdan, et al.
出版: (2024)
由: Turbal, Bohdan, et al.
出版: (2024)
Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
由: Miranda, Michele, et al.
出版: (2024)
由: Miranda, Michele, et al.
出版: (2024)
Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities
由: Khattar, Vanshaj, et al.
出版: (2026)
由: Khattar, Vanshaj, et al.
出版: (2026)
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
由: Golchin, Shahriar, et al.
出版: (2023)
由: Golchin, Shahriar, et al.
出版: (2023)
EnJa: Ensemble Jailbreak on Large Language Models
由: Zhang, Jiahao, et al.
出版: (2024)
由: Zhang, Jiahao, et al.
出版: (2024)
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
由: Yi, Sibo, et al.
出版: (2024)
由: Yi, Sibo, et al.
出版: (2024)
Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
由: To, Bang Trinh Tran, et al.
出版: (2025)
由: To, Bang Trinh Tran, et al.
出版: (2025)
Adversarial Attacks on Large Language Models Using Regularized Relaxation
由: Chacko, Samuel Jacob, et al.
出版: (2024)
由: Chacko, Samuel Jacob, et al.
出版: (2024)
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
由: Zhao, Shuai, et al.
出版: (2024)
由: Zhao, Shuai, et al.
出版: (2024)
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
由: Fu, Tingchen, et al.
出版: (2024)
由: Fu, Tingchen, et al.
出版: (2024)
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models
由: Peng, Benji, et al.
出版: (2024)
由: Peng, Benji, et al.
出版: (2024)
Evaluating Adversarial Vulnerabilities in Modern Large Language Models
由: Perel, Tom
出版: (2025)
由: Perel, Tom
出版: (2025)
SVIP: Towards Verifiable Inference of Open-source Large Language Models
由: Sun, Yifan, et al.
出版: (2024)
由: Sun, Yifan, et al.
出版: (2024)
Unlocking Memorization in Large Language Models with Dynamic Soft Prompting
由: Wang, Zhepeng, et al.
出版: (2024)
由: Wang, Zhepeng, et al.
出版: (2024)
Large Language Model Sentinel: LLM Agent for Adversarial Purification
由: Lin, Guang, et al.
出版: (2024)
由: Lin, Guang, et al.
出版: (2024)
Finetuning Large Language Models for Vulnerability Detection
由: Shestov, Alexey, et al.
出版: (2024)
由: Shestov, Alexey, et al.
出版: (2024)
BACKTIME: Backdoor Attacks on Multivariate Time Series Forecasting
由: Lin, Xiao, et al.
出版: (2024)
由: Lin, Xiao, et al.
出版: (2024)
On the Role of Attention Heads in Large Language Model Safety
由: Zhou, Zhenhong, et al.
出版: (2024)
由: Zhou, Zhenhong, et al.
出版: (2024)
In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
由: Shetty, Anudeex, et al.
出版: (2026)
由: Shetty, Anudeex, et al.
出版: (2026)
Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models
由: Liu, Xiao, et al.
出版: (2024)
由: Liu, Xiao, et al.
出版: (2024)
A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem
由: Ogundoyin, Sunday Oyinlola, et al.
出版: (2025)
由: Ogundoyin, Sunday Oyinlola, et al.
出版: (2025)
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
由: Nihal, Ragib Amin, et al.
出版: (2025)
由: Nihal, Ragib Amin, et al.
出版: (2025)
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
由: Yan, Yu, et al.
出版: (2025)
由: Yan, Yu, et al.
出版: (2025)
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
由: Struppek, Lukas, et al.
出版: (2026)
由: Struppek, Lukas, et al.
出版: (2026)
Instructional Fingerprinting of Large Language Models
由: Xu, Jiashu, et al.
出版: (2024)
由: Xu, Jiashu, et al.
出版: (2024)
LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis
由: Alhazbi, Saeif, et al.
出版: (2025)
由: Alhazbi, Saeif, et al.
出版: (2025)
Large Language Models in Cybersecurity: State-of-the-Art
由: Motlagh, Farzad Nourmohammadzadeh, et al.
出版: (2024)
由: Motlagh, Farzad Nourmohammadzadeh, et al.
出版: (2024)
Duwak: Dual Watermarks in Large Language Models
由: Zhu, Chaoyi, et al.
出版: (2024)
由: Zhu, Chaoyi, et al.
出版: (2024)
Jailbreaking Large Language Models with Symbolic Mathematics
由: Bethany, Emet, et al.
出版: (2024)
由: Bethany, Emet, et al.
出版: (2024)
Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models
由: Zheng, Yujia, et al.
出版: (2025)
由: Zheng, Yujia, et al.
出版: (2025)
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
由: Huang, Tiansheng, et al.
出版: (2025)
由: Huang, Tiansheng, et al.
出版: (2025)
相似書籍
-
Adversarial Text Purification: A Large Language Model Approach for Defense
由: Moraffah, Raha, et al.
出版: (2024) -
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
由: Xu, Jiashu, et al.
出版: (2023) -
The Resurgence of GCG Adversarial Attacks on Large Language Models
由: Tan, Yuting, et al.
出版: (2025) -
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
由: Zhang, Yihao, et al.
出版: (2024) -
RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models
由: Chugh, Rishit
出版: (2026)