:: Library Catalog

封面圖片

Saved in:

書目詳細資料
Main Authors:	Liu, Fuqiang, Jiang, Sicong, Miranda-Moreno, Luis, Choi, Seongjin, Sun, Lijun
格式:	Preprint
出版:	2024
主題:	Machine Learning Artificial Intelligence Computation and Language Cryptography and Security
在線閱讀:	https://arxiv.org/abs/2412.08099
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

相似書籍

Adversarial Text Purification: A Large Language Model Approach for Defense
由: Moraffah, Raha, et al.
出版: (2024)

Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
由: Xu, Jiashu, et al.
出版: (2023)

The Resurgence of GCG Adversarial Attacks on Large Language Models
由: Tan, Yuting, et al.
出版: (2025)

Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
由: Zhang, Yihao, et al.
出版: (2024)

RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models
由: Chugh, Rishit
出版: (2026)

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
由: Yuan, Hongbang, et al.
出版: (2024)

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
由: Li, Lijun, et al.
出版: (2024)

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
由: Reddy, Aashray, et al.
出版: (2025)

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
由: Xue, Jiaqi, et al.
出版: (2024)

On Adversarial Robustness of Language Models in Transfer Learning
由: Turbal, Bohdan, et al.
出版: (2024)

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
由: Miranda, Michele, et al.
出版: (2024)

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities
由: Khattar, Vanshaj, et al.
出版: (2026)

Time Travel in LLMs: Tracing Data Contamination in Large Language Models
由: Golchin, Shahriar, et al.
出版: (2023)

EnJa: Ensemble Jailbreak on Large Language Models
由: Zhang, Jiahao, et al.
出版: (2024)

Jailbreak Attacks and Defenses Against Large Language Models: A Survey
由: Yi, Sibo, et al.
出版: (2024)

Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
由: To, Bang Trinh Tran, et al.
出版: (2025)

Adversarial Attacks on Large Language Models Using Regularized Relaxation
由: Chacko, Samuel Jacob, et al.
出版: (2024)

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
由: Zhao, Shuai, et al.
出版: (2024)

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
由: Fu, Tingchen, et al.
出版: (2024)

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models
由: Peng, Benji, et al.
出版: (2024)

Evaluating Adversarial Vulnerabilities in Modern Large Language Models
由: Perel, Tom
出版: (2025)

SVIP: Towards Verifiable Inference of Open-source Large Language Models
由: Sun, Yifan, et al.
出版: (2024)

Unlocking Memorization in Large Language Models with Dynamic Soft Prompting
由: Wang, Zhepeng, et al.
出版: (2024)

Large Language Model Sentinel: LLM Agent for Adversarial Purification
由: Lin, Guang, et al.
出版: (2024)

Finetuning Large Language Models for Vulnerability Detection
由: Shestov, Alexey, et al.
出版: (2024)

BACKTIME: Backdoor Attacks on Multivariate Time Series Forecasting
由: Lin, Xiao, et al.
出版: (2024)

On the Role of Attention Heads in Large Language Model Safety
由: Zhou, Zhenhong, et al.
出版: (2024)

In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
由: Shetty, Anudeex, et al.
出版: (2026)

Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models
由: Liu, Xiao, et al.
出版: (2024)

A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem
由: Ogundoyin, Sunday Oyinlola, et al.
出版: (2025)

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
由: Nihal, Ragib Amin, et al.
出版: (2025)

from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
由: Yan, Yu, et al.
出版: (2025)

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
由: Struppek, Lukas, et al.
出版: (2026)

Instructional Fingerprinting of Large Language Models
由: Xu, Jiashu, et al.
出版: (2024)

LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis
由: Alhazbi, Saeif, et al.
出版: (2025)

Large Language Models in Cybersecurity: State-of-the-Art
由: Motlagh, Farzad Nourmohammadzadeh, et al.
出版: (2024)

Duwak: Dual Watermarks in Large Language Models
由: Zhu, Chaoyi, et al.
出版: (2024)

Jailbreaking Large Language Models with Symbolic Mathematics
由: Bethany, Emet, et al.
出版: (2024)

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models
由: Zheng, Yujia, et al.
出版: (2025)

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
由: Huang, Tiansheng, et al.
出版: (2025)