Guardado en:
| Autores principales: | Zhang, Jie, Ding, Meng, Liu, Yang, Hong, Jue, Tramèr, Florian |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2510.16794 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Evading Black-box Classifiers Without Breaking Eggs
por: Debenedetti, Edoardo, et al.
Publicado: (2023)
por: Debenedetti, Edoardo, et al.
Publicado: (2023)
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
por: Nikolić, Kristina, et al.
Publicado: (2025)
por: Nikolić, Kristina, et al.
Publicado: (2025)
Evaluations of Machine Learning Privacy Defenses are Misleading
por: Aerni, Michael, et al.
Publicado: (2024)
por: Aerni, Michael, et al.
Publicado: (2024)
Adversarial Search Engine Optimization for Large Language Models
por: Nestaas, Fredrik, et al.
Publicado: (2024)
por: Nestaas, Fredrik, et al.
Publicado: (2024)
Membership Inference Attacks on Sequence Models
por: Rossi, Lorenzo, et al.
Publicado: (2025)
por: Rossi, Lorenzo, et al.
Publicado: (2025)
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
por: Rando, Javier, et al.
Publicado: (2025)
por: Rando, Javier, et al.
Publicado: (2025)
Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
por: Zhang, Jie, et al.
Publicado: (2024)
por: Zhang, Jie, et al.
Publicado: (2024)
Laundering AI Authority with Adversarial Examples
por: Zhang, Jie, et al.
Publicado: (2026)
por: Zhang, Jie, et al.
Publicado: (2026)
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
por: Feng, Shanglun, et al.
Publicado: (2024)
por: Feng, Shanglun, et al.
Publicado: (2024)
Blind Baselines Beat Membership Inference Attacks for Foundation Models
por: Das, Debeshee, et al.
Publicado: (2024)
por: Das, Debeshee, et al.
Publicado: (2024)
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
por: Debenedetti, Edoardo, et al.
Publicado: (2024)
por: Debenedetti, Edoardo, et al.
Publicado: (2024)
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
por: Tramèr, Florian, et al.
Publicado: (2022)
por: Tramèr, Florian, et al.
Publicado: (2022)
Evaluating the Robustness of the "Ensemble Everything Everywhere" Defense
por: Zhang, Jie, et al.
Publicado: (2024)
por: Zhang, Jie, et al.
Publicado: (2024)
Universal Jailbreak Backdoors from Poisoned Human Feedback
por: Rando, Javier, et al.
Publicado: (2023)
por: Rando, Javier, et al.
Publicado: (2023)
Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
por: Coalson, Zachary, et al.
Publicado: (2026)
por: Coalson, Zachary, et al.
Publicado: (2026)
Traceable Black-box Watermarks for Federated Learning
por: Xu, Jiahao, et al.
Publicado: (2025)
por: Xu, Jiahao, et al.
Publicado: (2025)
Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs -- A Graph Sequential Embedding Method
por: Li, Jiate, et al.
Publicado: (2024)
por: Li, Jiate, et al.
Publicado: (2024)
LoRAGuard: An Effective Black-box Watermarking Approach for LoRAs
por: Lv, Peizhuo, et al.
Publicado: (2025)
por: Lv, Peizhuo, et al.
Publicado: (2025)
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
por: Carlini, Nicholas, et al.
Publicado: (2025)
por: Carlini, Nicholas, et al.
Publicado: (2025)
Online Poisoning Attack Against Reinforcement Learning under Black-box Environments
por: Li, Jianhui, et al.
Publicado: (2024)
por: Li, Jianhui, et al.
Publicado: (2024)
Dynamic Black-box Backdoor Attacks on IoT Sensory Data
por: Chathoth, Ajesh Koyatan, et al.
Publicado: (2025)
por: Chathoth, Ajesh Koyatan, et al.
Publicado: (2025)
Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective
por: Roshan, Khushnaseeb, et al.
Publicado: (2024)
por: Roshan, Khushnaseeb, et al.
Publicado: (2024)
Design Patterns for Securing LLM Agents against Prompt Injections
por: Beurer-Kellner, Luca, et al.
Publicado: (2025)
por: Beurer-Kellner, Luca, et al.
Publicado: (2025)
SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks
por: Gao, Yue, et al.
Publicado: (2023)
por: Gao, Yue, et al.
Publicado: (2023)
Large-scale online deanonymization with LLMs
por: Lermen, Simon, et al.
Publicado: (2026)
por: Lermen, Simon, et al.
Publicado: (2026)
A Generative Approach to Surrogate-based Black-box Attacks
por: Moraffah, Raha, et al.
Publicado: (2024)
por: Moraffah, Raha, et al.
Publicado: (2024)
Privacy Side Channels in Machine Learning Systems
por: Debenedetti, Edoardo, et al.
Publicado: (2023)
por: Debenedetti, Edoardo, et al.
Publicado: (2023)
Poisoning Web-Scale Training Datasets is Practical
por: Carlini, Nicholas, et al.
Publicado: (2023)
por: Carlini, Nicholas, et al.
Publicado: (2023)
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models
por: Liu, Yu-An, et al.
Publicado: (2024)
por: Liu, Yu-An, et al.
Publicado: (2024)
A General Black-box Adversarial Attack on Graph-based Fake News Detectors
por: Zhu, Peican, et al.
Publicado: (2024)
por: Zhu, Peican, et al.
Publicado: (2024)
ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking
por: Li, Yunzhe, et al.
Publicado: (2025)
por: Li, Yunzhe, et al.
Publicado: (2025)
Query-Based Adversarial Prompt Generation
por: Hayase, Jonathan, et al.
Publicado: (2024)
por: Hayase, Jonathan, et al.
Publicado: (2024)
EvadeDroid: A Practical Evasion Attack on Machine Learning for Black-box Android Malware Detection
por: Bostani, Hamid, et al.
Publicado: (2021)
por: Bostani, Hamid, et al.
Publicado: (2021)
MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
por: Yamabe, Shojiro, et al.
Publicado: (2024)
por: Yamabe, Shojiro, et al.
Publicado: (2024)
Output Perturbation for Differentially Private Convex Optimization: Faster and More General
por: Lowy, Andrew, et al.
Publicado: (2021)
por: Lowy, Andrew, et al.
Publicado: (2021)
AED: An black-box NLP classifier model attacker
por: Liu, Yueyang, et al.
Publicado: (2021)
por: Liu, Yueyang, et al.
Publicado: (2021)
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
por: Huang, Yangsibo, et al.
Publicado: (2025)
por: Huang, Yangsibo, et al.
Publicado: (2025)
Localizing Malicious Outputs from CodeLLM
por: Borana, Mayukh, et al.
Publicado: (2025)
por: Borana, Mayukh, et al.
Publicado: (2025)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
por: Chao, Patrick, et al.
Publicado: (2024)
por: Chao, Patrick, et al.
Publicado: (2024)
An Adversarial Perspective on Machine Unlearning for AI Safety
por: Łucki, Jakub, et al.
Publicado: (2024)
por: Łucki, Jakub, et al.
Publicado: (2024)
Ejemplares similares
-
Evading Black-box Classifiers Without Breaking Eggs
por: Debenedetti, Edoardo, et al.
Publicado: (2023) -
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
por: Nikolić, Kristina, et al.
Publicado: (2025) -
Evaluations of Machine Learning Privacy Defenses are Misleading
por: Aerni, Michael, et al.
Publicado: (2024) -
Adversarial Search Engine Optimization for Large Language Models
por: Nestaas, Fredrik, et al.
Publicado: (2024) -
Membership Inference Attacks on Sequence Models
por: Rossi, Lorenzo, et al.
Publicado: (2025)