Saved in:
| Main Author: | Bonetto, Davi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.12414 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Persona-Model Collapse in Emergent Misalignment
by: Costa, Davi Bastos, et al.
Published: (2026)
by: Costa, Davi Bastos, et al.
Published: (2026)
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
by: Hossain, Ismail, et al.
Published: (2026)
by: Hossain, Ismail, et al.
Published: (2026)
VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
by: Tsaprazlis, Efthymios, et al.
Published: (2025)
by: Tsaprazlis, Efthymios, et al.
Published: (2025)
State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space
by: Guo, Ji, et al.
Published: (2026)
by: Guo, Ji, et al.
Published: (2026)
Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks
by: Noorbakhsh, Sayedeh Leila, et al.
Published: (2024)
by: Noorbakhsh, Sayedeh Leila, et al.
Published: (2024)
PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis
by: Islam, Md Robiul, et al.
Published: (2024)
by: Islam, Md Robiul, et al.
Published: (2024)
Attack and Defense of Deep Learning Models in the Field of Web Attack Detection
by: Shi, Lijia, et al.
Published: (2024)
by: Shi, Lijia, et al.
Published: (2024)
Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems
by: Li, Xin, et al.
Published: (2026)
by: Li, Xin, et al.
Published: (2026)
Learning in Multiple Spaces: Few-Shot Network Attack Detection with Metric-Fused Prototypical Networks
by: Martinez-Lopez, Fernando, et al.
Published: (2024)
by: Martinez-Lopez, Fernando, et al.
Published: (2024)
MRMMIA: Membership Inference Attacks on Memory in Chat Agents
by: Chen, Kai, et al.
Published: (2026)
by: Chen, Kai, et al.
Published: (2026)
Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
by: Guo, Yihao, et al.
Published: (2025)
by: Guo, Yihao, et al.
Published: (2025)
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
by: Cornacchia, Giandomenico, et al.
Published: (2024)
by: Cornacchia, Giandomenico, et al.
Published: (2024)
SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning
by: Partohaghighi, Mohammad, et al.
Published: (2026)
by: Partohaghighi, Mohammad, et al.
Published: (2026)
Walma: Learning to See Memory Corruption in WebAssembly
by: Draissi, Oussama, et al.
Published: (2026)
by: Draissi, Oussama, et al.
Published: (2026)
SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation
by: Su, Guangzhi, et al.
Published: (2025)
by: Su, Guangzhi, et al.
Published: (2025)
Bypassing Prompt Guards in Production with Controlled-Release Prompting
by: Fairoze, Jaiden, et al.
Published: (2025)
by: Fairoze, Jaiden, et al.
Published: (2025)
AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection
by: Ferrel, Vickson
Published: (2026)
by: Ferrel, Vickson
Published: (2026)
WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT
by: Tian, Ye, et al.
Published: (2025)
by: Tian, Ye, et al.
Published: (2025)
ADVENT: Attack/Anomaly Detection in VANETs
by: Baharlouei, Hamideh, et al.
Published: (2024)
by: Baharlouei, Hamideh, et al.
Published: (2024)
Trojan Cleansing with Neural Collapse
by: Gu, Xihe, et al.
Published: (2024)
by: Gu, Xihe, et al.
Published: (2024)
Intriguing Properties of Adversarial ML Attacks in the Problem Space [Extended Version]
by: Cortellazzi, Jacopo, et al.
Published: (2019)
by: Cortellazzi, Jacopo, et al.
Published: (2019)
COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
by: Das, Sanjay, et al.
Published: (2025)
by: Das, Sanjay, et al.
Published: (2025)
Using Anomaly Detection to Detect Poisoning Attacks in Federated Learning Applications
by: Raza, Ali, et al.
Published: (2022)
by: Raza, Ali, et al.
Published: (2022)
FedRecAttack: Model Poisoning Attack to Federated Recommendation
by: Rong, Dazhong, et al.
Published: (2022)
by: Rong, Dazhong, et al.
Published: (2022)
FRIDA: Free-Rider Detection using Privacy Attacks
by: Recasens, Pol G., et al.
Published: (2024)
by: Recasens, Pol G., et al.
Published: (2024)
ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks
by: Cheng, Zhan, et al.
Published: (2025)
by: Cheng, Zhan, et al.
Published: (2025)
Calibration Attacks: A Comprehensive Study of Adversarial Attacks on Model Confidence
by: Obadinma, Stephen, et al.
Published: (2024)
by: Obadinma, Stephen, et al.
Published: (2024)
Model Extraction Attacks Revisited
by: Liang, Jiacheng, et al.
Published: (2023)
by: Liang, Jiacheng, et al.
Published: (2023)
On Calibration of LLM-based Guard Models for Reliable Content Moderation
by: Liu, Hongfu, et al.
Published: (2024)
by: Liu, Hongfu, et al.
Published: (2024)
Detecting Instruction Fine-tuning Attacks using Influence Function
by: Li, Jiawei
Published: (2025)
by: Li, Jiawei
Published: (2025)
GasTrace: Detecting Sandwich Attack Malicious Accounts in Ethereum
by: Liu, Zekai, et al.
Published: (2024)
by: Liu, Zekai, et al.
Published: (2024)
DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks
by: Sui, Hao, et al.
Published: (2024)
by: Sui, Hao, et al.
Published: (2024)
SENet: Visual Detection of Online Social Engineering Attack Campaigns
by: Ozen, Irfan, et al.
Published: (2024)
by: Ozen, Irfan, et al.
Published: (2024)
Analysis of Zero Day Attack Detection Using MLP and XAI
by: Dahal, Ashim, et al.
Published: (2025)
by: Dahal, Ashim, et al.
Published: (2025)
Detecting Backdoor Attacks via Similarity in Semantic Communication Systems
by: Wei, Ziyang, et al.
Published: (2025)
by: Wei, Ziyang, et al.
Published: (2025)
GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption
by: Frimpong, Eugene, et al.
Published: (2024)
by: Frimpong, Eugene, et al.
Published: (2024)
KubeGuard: LLM-Assisted Kubernetes Hardening via Configuration Files and Runtime Logs Analysis
by: Cohen, Omri Sgan, et al.
Published: (2025)
by: Cohen, Omri Sgan, et al.
Published: (2025)
Membership Inference Attacks on Sequence Models
by: Rossi, Lorenzo, et al.
Published: (2025)
by: Rossi, Lorenzo, et al.
Published: (2025)
Model Hijacking Attack in Federated Learning
by: Li, Zheng, et al.
Published: (2024)
by: Li, Zheng, et al.
Published: (2024)
Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks
by: Giri, Nandakrishna, et al.
Published: (2026)
by: Giri, Nandakrishna, et al.
Published: (2026)
Similar Items
-
Persona-Model Collapse in Emergent Misalignment
by: Costa, Davi Bastos, et al.
Published: (2026) -
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
by: Hossain, Ismail, et al.
Published: (2026) -
VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
by: Tsaprazlis, Efthymios, et al.
Published: (2025) -
State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space
by: Guo, Ji, et al.
Published: (2026) -
Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks
by: Noorbakhsh, Sayedeh Leila, et al.
Published: (2024)