Saved in:
| Main Author: | Halloran, John T. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11217 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits
by: Radosevich, Brandon, et al.
Published: (2025)
by: Radosevich, Brandon, et al.
Published: (2025)
Understanding the Effects of Safety Unalignment on Large Language Models
by: Halloran, John T.
Published: (2026)
by: Halloran, John T.
Published: (2026)
Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
by: Halloran, John T., et al.
Published: (2026)
by: Halloran, John T., et al.
Published: (2026)
MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
by: Halloran, John
Published: (2025)
by: Halloran, John
Published: (2025)
Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private
by: Wu, Ruihan, et al.
Published: (2025)
by: Wu, Ruihan, et al.
Published: (2025)
RAG with Differential Privacy
by: Grislain, Nicolas
Published: (2024)
by: Grislain, Nicolas
Published: (2024)
SME-TEAM: Leveraging Trust and Ethics for Secure and Responsible Use of AI and LLMs in SMEs
by: Sarker, Iqbal H., et al.
Published: (2025)
by: Sarker, Iqbal H., et al.
Published: (2025)
GraphRAG under Fire
by: Liang, Jiacheng, et al.
Published: (2025)
by: Liang, Jiacheng, et al.
Published: (2025)
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
by: Ghosal, Soumya Suvra, et al.
Published: (2024)
by: Ghosal, Soumya Suvra, et al.
Published: (2024)
Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning
by: Zhou, Xinjie, et al.
Published: (2026)
by: Zhou, Xinjie, et al.
Published: (2026)
Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Model Watermarking
by: Kong, Cong, et al.
Published: (2024)
by: Kong, Cong, et al.
Published: (2024)
CLASP: Training-Free LLM-Assisted Source Code Watermarking via Semantic-Preserving Transformations
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
Ward: Provable RAG Dataset Inference via LLM Watermarks
by: Jovanović, Nikola, et al.
Published: (2024)
by: Jovanović, Nikola, et al.
Published: (2024)
Res-MIA: A Training-Free Resolution-Based Membership Inference Attack on Federated Learning Models
by: Zare, Mohammad, et al.
Published: (2026)
by: Zare, Mohammad, et al.
Published: (2026)
Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
by: Thornton, Scott
Published: (2026)
by: Thornton, Scott
Published: (2026)
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)
by: Shao, Zedian, et al.
Published: (2024)
Topological Signatures of Adversaries in Multimodal Alignments
by: Vu, Minh, et al.
Published: (2025)
by: Vu, Minh, et al.
Published: (2025)
How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System
by: Zuo, Kaiwen, et al.
Published: (2025)
by: Zuo, Kaiwen, et al.
Published: (2025)
No More, No Less: Task Alignment in Terminal Agents
by: Mavali, Sina, et al.
Published: (2026)
by: Mavali, Sina, et al.
Published: (2026)
MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks
by: Zhang, Jiaming, et al.
Published: (2023)
by: Zhang, Jiaming, et al.
Published: (2023)
Leveraging AI to optimize website structure discovery during Penetration Testing
by: Antonelli, Diego, et al.
Published: (2021)
by: Antonelli, Diego, et al.
Published: (2021)
Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations
by: Wang, Cheng, et al.
Published: (2024)
by: Wang, Cheng, et al.
Published: (2024)
Improved Algorithms for Differentially Private Language Model Alignment
by: Chen, Keyu, et al.
Published: (2025)
by: Chen, Keyu, et al.
Published: (2025)
Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment
by: Li, Yuxi, et al.
Published: (2024)
by: Li, Yuxi, et al.
Published: (2024)
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
by: Rashid, Md Rafi Ur, et al.
Published: (2024)
by: Rashid, Md Rafi Ur, et al.
Published: (2024)
RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
by: Liang, Jiacheng, et al.
Published: (2026)
by: Liang, Jiacheng, et al.
Published: (2026)
SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
by: Jiang, Fengqing, et al.
Published: (2025)
by: Jiang, Fengqing, et al.
Published: (2025)
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
by: Zaree, Pedram, et al.
Published: (2025)
by: Zaree, Pedram, et al.
Published: (2025)
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)
by: Vega, Jason, et al.
Published: (2023)
Automated Consistency Analysis of LLMs
by: Patwardhan, Aditya, et al.
Published: (2025)
by: Patwardhan, Aditya, et al.
Published: (2025)
Advancing Email Spam Detection: Leveraging Zero-Shot Learning and Large Language Models
by: SHirvani, Ghazaleh, et al.
Published: (2025)
by: SHirvani, Ghazaleh, et al.
Published: (2025)
AI-Driven Anonymization: Protecting Personal Data Privacy While Leveraging Machine Learning
by: Yang, Le, et al.
Published: (2024)
by: Yang, Le, et al.
Published: (2024)
Differentially Private Preference Data Synthesis for Large Language Model Alignment
by: Gao, Fengyu, et al.
Published: (2026)
by: Gao, Fengyu, et al.
Published: (2026)
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
by: Zhang, Yi, et al.
Published: (2025)
by: Zhang, Yi, et al.
Published: (2025)
Large-scale online deanonymization with LLMs
by: Lermen, Simon, et al.
Published: (2026)
by: Lermen, Simon, et al.
Published: (2026)
Scaling Trends for Data Poisoning in LLMs
by: Bowen, Dillon, et al.
Published: (2024)
by: Bowen, Dillon, et al.
Published: (2024)
Secure Energy Transactions Using Blockchain Leveraging AI for Fraud Detection and Energy Market Stability
by: Khan, Md Asif Ul Hoq, et al.
Published: (2025)
by: Khan, Md Asif Ul Hoq, et al.
Published: (2025)
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
by: Chen, Zhuowei, et al.
Published: (2025)
by: Chen, Zhuowei, et al.
Published: (2025)
Adaptive Discounting of Training Time Attacks
by: Bector, Ridhima, et al.
Published: (2024)
by: Bector, Ridhima, et al.
Published: (2024)
Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
by: Yuan, Leitao, et al.
Published: (2026)
by: Yuan, Leitao, et al.
Published: (2026)
Similar Items
-
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits
by: Radosevich, Brandon, et al.
Published: (2025) -
Understanding the Effects of Safety Unalignment on Large Language Models
by: Halloran, John T.
Published: (2026) -
Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
by: Halloran, John T., et al.
Published: (2026) -
MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
by: Halloran, John
Published: (2025) -
Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private
by: Wu, Ruihan, et al.
Published: (2025)