Guardado en:
| Autores principales: | Hui, Bo, Yuan, Haolin, Gong, Neil, Burlina, Philippe, Cao, Yinzhi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2405.06823 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
A Survey on Model Extraction Attacks and Defenses for Large Language Models
por: Zhao, Kaixiang, et al.
Publicado: (2025)
por: Zhao, Kaixiang, et al.
Publicado: (2025)
Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning
por: Brimhall, Brennon, et al.
Publicado: (2025)
por: Brimhall, Brennon, et al.
Publicado: (2025)
LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
por: Panebianco, Francesco, et al.
Publicado: (2025)
por: Panebianco, Francesco, et al.
Publicado: (2025)
Prompt Injection Attacks on Large Language Models in Oncology
por: Clusmann, Jan, et al.
Publicado: (2024)
por: Clusmann, Jan, et al.
Publicado: (2024)
CHAI: Command Hijacking against embodied AI
por: Burbano, Luis, et al.
Publicado: (2025)
por: Burbano, Luis, et al.
Publicado: (2025)
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
por: Liu, Yupei, et al.
Publicado: (2023)
por: Liu, Yupei, et al.
Publicado: (2023)
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
por: Shao, Zedian, et al.
Publicado: (2024)
por: Shao, Zedian, et al.
Publicado: (2024)
Refusing Safe Prompts for Multi-modal Large Language Models
por: Shao, Zedian, et al.
Publicado: (2024)
por: Shao, Zedian, et al.
Publicado: (2024)
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness
por: Wang, Cheng-Long, et al.
Publicado: (2025)
por: Wang, Cheng-Long, et al.
Publicado: (2025)
A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments
por: Zhao, Kaixiang, et al.
Publicado: (2025)
por: Zhao, Kaixiang, et al.
Publicado: (2025)
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives
por: Zhao, Kaixiang, et al.
Publicado: (2025)
por: Zhao, Kaixiang, et al.
Publicado: (2025)
Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy
por: Sui, Zhihao, et al.
Publicado: (2025)
por: Sui, Zhihao, et al.
Publicado: (2025)
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
por: Nie, Yuzhou., et al.
Publicado: (2024)
por: Nie, Yuzhou., et al.
Publicado: (2024)
Your Agent Can Defend Itself against Backdoor Attacks
por: Changjiang, Li, et al.
Publicado: (2025)
por: Changjiang, Li, et al.
Publicado: (2025)
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
por: Yuan, Zhuowen, et al.
Publicado: (2024)
por: Yuan, Zhuowen, et al.
Publicado: (2024)
Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening
por: Zhang, Mohan, et al.
Publicado: (2026)
por: Zhang, Mohan, et al.
Publicado: (2026)
Context-Aware Membership Inference Attacks against Pre-trained Large Language Models
por: Chang, Hongyan, et al.
Publicado: (2024)
por: Chang, Hongyan, et al.
Publicado: (2024)
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
por: Guo, Qiming, et al.
Publicado: (2025)
por: Guo, Qiming, et al.
Publicado: (2025)
Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models
por: Li, Xiao, et al.
Publicado: (2024)
por: Li, Xiao, et al.
Publicado: (2024)
Backdoor Attack against One-Class Sequential Anomaly Detection Models
por: Cheng, He, et al.
Publicado: (2024)
por: Cheng, He, et al.
Publicado: (2024)
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
por: Chen, Guangke, et al.
Publicado: (2025)
por: Chen, Guangke, et al.
Publicado: (2025)
Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
por: Shao, Zedian, et al.
Publicado: (2026)
por: Shao, Zedian, et al.
Publicado: (2026)
A Critical Evaluation of Defenses against Prompt Injection Attacks
por: Jia, Yuqi, et al.
Publicado: (2025)
por: Jia, Yuqi, et al.
Publicado: (2025)
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
por: Wahréus, Johan, et al.
Publicado: (2025)
por: Wahréus, Johan, et al.
Publicado: (2025)
Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models
por: Sivashanmugam, Sathesh P.
Publicado: (2025)
por: Sivashanmugam, Sathesh P.
Publicado: (2025)
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
por: Huang, Tiansheng, et al.
Publicado: (2024)
por: Huang, Tiansheng, et al.
Publicado: (2024)
Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
por: Yuan, Leitao, et al.
Publicado: (2026)
por: Yuan, Leitao, et al.
Publicado: (2026)
Unlocking Memorization in Large Language Models with Dynamic Soft Prompting
por: Wang, Zhepeng, et al.
Publicado: (2024)
por: Wang, Zhepeng, et al.
Publicado: (2024)
Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning
por: Zhang, Yujie, et al.
Publicado: (2024)
por: Zhang, Yujie, et al.
Publicado: (2024)
From Data Leak to Secret Misses: The Impact of Data Leakage on Secret Detection Models
por: Soltaniani, Farnaz, et al.
Publicado: (2026)
por: Soltaniani, Farnaz, et al.
Publicado: (2026)
Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts
por: Liu, Yi, et al.
Publicado: (2024)
por: Liu, Yi, et al.
Publicado: (2024)
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
por: Jiang, Zhengyuan, et al.
Publicado: (2025)
por: Jiang, Zhengyuan, et al.
Publicado: (2025)
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
por: Liu, Guozhi, et al.
Publicado: (2025)
por: Liu, Guozhi, et al.
Publicado: (2025)
Turning Black Box into White Box: Dataset Distillation Leaks
por: Chen, Huajie, et al.
Publicado: (2026)
por: Chen, Huajie, et al.
Publicado: (2026)
Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques
por: Jaffal, Niveen O., et al.
Publicado: (2025)
por: Jaffal, Niveen O., et al.
Publicado: (2025)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs
por: Hung, Kuo-Han, et al.
Publicado: (2024)
por: Hung, Kuo-Han, et al.
Publicado: (2024)
Verification of Bit-Flip Attacks against Quantized Neural Networks
por: Zhang, Yedi, et al.
Publicado: (2025)
por: Zhang, Yedi, et al.
Publicado: (2025)
Client-Side Patching against Backdoor Attacks in Federated Learning
por: Molina-Coronado, Borja
Publicado: (2024)
por: Molina-Coronado, Borja
Publicado: (2024)
The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks
por: Chhetri, Bipin, et al.
Publicado: (2025)
por: Chhetri, Bipin, et al.
Publicado: (2025)
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
por: Yue, Murong, et al.
Publicado: (2025)
por: Yue, Murong, et al.
Publicado: (2025)
Ejemplares similares
-
A Survey on Model Extraction Attacks and Defenses for Large Language Models
por: Zhao, Kaixiang, et al.
Publicado: (2025) -
Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning
por: Brimhall, Brennon, et al.
Publicado: (2025) -
LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
por: Panebianco, Francesco, et al.
Publicado: (2025) -
Prompt Injection Attacks on Large Language Models in Oncology
por: Clusmann, Jan, et al.
Publicado: (2024) -
CHAI: Command Hijacking against embodied AI
por: Burbano, Luis, et al.
Publicado: (2025)