:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhang, Jie, Ding, Meng, Liu, Yang, Hong, Jue, Tramèr, Florian
Formato:	Preprint
Publicado:	2025
Materias:	Cryptography and Security Machine Learning
Acceso en línea:	https://arxiv.org/abs/2510.16794
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Evading Black-box Classifiers Without Breaking Eggs
por: Debenedetti, Edoardo, et al.
Publicado: (2023)

The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
por: Nikolić, Kristina, et al.
Publicado: (2025)

Evaluations of Machine Learning Privacy Defenses are Misleading
por: Aerni, Michael, et al.
Publicado: (2024)

Adversarial Search Engine Optimization for Large Language Models
por: Nestaas, Fredrik, et al.
Publicado: (2024)

Membership Inference Attacks on Sequence Models
por: Rossi, Lorenzo, et al.
Publicado: (2025)

Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
por: Rando, Javier, et al.
Publicado: (2025)

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
por: Zhang, Jie, et al.
Publicado: (2024)

Laundering AI Authority with Adversarial Examples
por: Zhang, Jie, et al.
Publicado: (2026)

Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
por: Feng, Shanglun, et al.
Publicado: (2024)

Blind Baselines Beat Membership Inference Attacks for Foundation Models
por: Das, Debeshee, et al.
Publicado: (2024)

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
por: Debenedetti, Edoardo, et al.
Publicado: (2024)

Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
por: Tramèr, Florian, et al.
Publicado: (2022)

Evaluating the Robustness of the "Ensemble Everything Everywhere" Defense
por: Zhang, Jie, et al.
Publicado: (2024)

Universal Jailbreak Backdoors from Poisoned Human Feedback
por: Rando, Javier, et al.
Publicado: (2023)

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
por: Coalson, Zachary, et al.
Publicado: (2026)

Traceable Black-box Watermarks for Federated Learning
por: Xu, Jiahao, et al.
Publicado: (2025)

Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs -- A Graph Sequential Embedding Method
por: Li, Jiate, et al.
Publicado: (2024)

LoRAGuard: An Effective Black-box Watermarking Approach for LoRAs
por: Lv, Peizhuo, et al.
Publicado: (2025)

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
por: Carlini, Nicholas, et al.
Publicado: (2025)

Online Poisoning Attack Against Reinforcement Learning under Black-box Environments
por: Li, Jianhui, et al.
Publicado: (2024)

Dynamic Black-box Backdoor Attacks on IoT Sensory Data
por: Chathoth, Ajesh Koyatan, et al.
Publicado: (2025)

Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective
por: Roshan, Khushnaseeb, et al.
Publicado: (2024)

Design Patterns for Securing LLM Agents against Prompt Injections
por: Beurer-Kellner, Luca, et al.
Publicado: (2025)

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks
por: Gao, Yue, et al.
Publicado: (2023)

Large-scale online deanonymization with LLMs
por: Lermen, Simon, et al.
Publicado: (2026)

A Generative Approach to Surrogate-based Black-box Attacks
por: Moraffah, Raha, et al.
Publicado: (2024)

Privacy Side Channels in Machine Learning Systems
por: Debenedetti, Edoardo, et al.
Publicado: (2023)

Poisoning Web-Scale Training Datasets is Practical
por: Carlini, Nicholas, et al.
Publicado: (2023)

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models
por: Liu, Yu-An, et al.
Publicado: (2024)

A General Black-box Adversarial Attack on Graph-based Fake News Detectors
por: Zhu, Peican, et al.
Publicado: (2024)

ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking
por: Li, Yunzhe, et al.
Publicado: (2025)

Query-Based Adversarial Prompt Generation
por: Hayase, Jonathan, et al.
Publicado: (2024)

EvadeDroid: A Practical Evasion Attack on Machine Learning for Black-box Android Malware Detection
por: Bostani, Hamid, et al.
Publicado: (2021)

MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
por: Yamabe, Shojiro, et al.
Publicado: (2024)

Output Perturbation for Differentially Private Convex Optimization: Faster and More General
por: Lowy, Andrew, et al.
Publicado: (2021)

AED: An black-box NLP classifier model attacker
por: Liu, Yueyang, et al.
Publicado: (2021)

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
por: Huang, Yangsibo, et al.
Publicado: (2025)

Localizing Malicious Outputs from CodeLLM
por: Borana, Mayukh, et al.
Publicado: (2025)

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
por: Chao, Patrick, et al.
Publicado: (2024)

An Adversarial Perspective on Machine Unlearning for AI Safety
por: Łucki, Jakub, et al.
Publicado: (2024)