:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Halloran, John T.
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Cryptography and Security
Online Access:	https://arxiv.org/abs/2605.11217
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits
by: Radosevich, Brandon, et al.
Published: (2025)

Understanding the Effects of Safety Unalignment on Large Language Models
by: Halloran, John T.
Published: (2026)

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
by: Halloran, John T., et al.
Published: (2026)

MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
by: Halloran, John
Published: (2025)

Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private
by: Wu, Ruihan, et al.
Published: (2025)

RAG with Differential Privacy
by: Grislain, Nicolas
Published: (2024)

SME-TEAM: Leveraging Trust and Ethics for Secure and Responsible Use of AI and LLMs in SMEs
by: Sarker, Iqbal H., et al.
Published: (2025)

GraphRAG under Fire
by: Liang, Jiacheng, et al.
Published: (2025)

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
by: Ghosal, Soumya Suvra, et al.
Published: (2024)

Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning
by: Zhou, Xinjie, et al.
Published: (2026)

Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Model Watermarking
by: Kong, Cong, et al.
Published: (2024)

CLASP: Training-Free LLM-Assisted Source Code Watermarking via Semantic-Preserving Transformations
by: Xu, Rui, et al.
Published: (2025)

Ward: Provable RAG Dataset Inference via LLM Watermarks
by: Jovanović, Nikola, et al.
Published: (2024)

Res-MIA: A Training-Free Resolution-Based Membership Inference Attack on Federated Learning Models
by: Zare, Mohammad, et al.
Published: (2026)

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
by: Thornton, Scott
Published: (2026)

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)

Topological Signatures of Adversaries in Multimodal Alignments
by: Vu, Minh, et al.
Published: (2025)

How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System
by: Zuo, Kaiwen, et al.
Published: (2025)

No More, No Less: Task Alignment in Terminal Agents
by: Mavali, Sina, et al.
Published: (2026)

MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks
by: Zhang, Jiaming, et al.
Published: (2023)

Leveraging AI to optimize website structure discovery during Penetration Testing
by: Antonelli, Diego, et al.
Published: (2021)

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations
by: Wang, Cheng, et al.
Published: (2024)

Improved Algorithms for Differentially Private Language Model Alignment
by: Chen, Keyu, et al.
Published: (2025)

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment
by: Li, Yuxi, et al.
Published: (2024)

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
by: Rashid, Md Rafi Ur, et al.
Published: (2024)

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
by: Liang, Jiacheng, et al.
Published: (2026)

SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
by: Jiang, Fengqing, et al.
Published: (2025)

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
by: Zaree, Pedram, et al.
Published: (2025)

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)

Automated Consistency Analysis of LLMs
by: Patwardhan, Aditya, et al.
Published: (2025)

Advancing Email Spam Detection: Leveraging Zero-Shot Learning and Large Language Models
by: SHirvani, Ghazaleh, et al.
Published: (2025)

AI-Driven Anonymization: Protecting Personal Data Privacy While Leveraging Machine Learning
by: Yang, Le, et al.
Published: (2024)

Differentially Private Preference Data Synthesis for Large Language Model Alignment
by: Gao, Fengyu, et al.
Published: (2026)

AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
by: Zhang, Yi, et al.
Published: (2025)

Large-scale online deanonymization with LLMs
by: Lermen, Simon, et al.
Published: (2026)

Scaling Trends for Data Poisoning in LLMs
by: Bowen, Dillon, et al.
Published: (2024)

Secure Energy Transactions Using Blockchain Leveraging AI for Fraud Detection and Energy Market Stability
by: Khan, Md Asif Ul Hoq, et al.
Published: (2025)

Injecting Universal Jailbreak Backdoors into LLMs in Minutes
by: Chen, Zhuowei, et al.
Published: (2025)

Adaptive Discounting of Training Time Attacks
by: Bector, Ridhima, et al.
Published: (2024)

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
by: Yuan, Leitao, et al.
Published: (2026)