:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Zihui, Gao, Haichang, Luo, Jiacheng, Liu, Zhaoxiang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Cryptography and Security
Online Access:	https://arxiv.org/abs/2501.13677
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Whispers of Data: Unveiling Label Distributions in Federated Learning Through Virtual Client Simulation
by: Ma, Zhixuan, et al.
Published: (2025)

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack
by: Wu, Tongxi, et al.
Published: (2026)

MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
by: Halloran, John
Published: (2025)

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
by: Wu, Zihui, et al.
Published: (2024)

PrefixWall: Mitigating Prefix Caching Side Channels in Shared LLM Systems
by: Pennas, Panagiotis Georgios, et al.
Published: (2026)

AdvPrefix: An Objective for Nuanced LLM Jailbreaks
by: Zhu, Sicheng, et al.
Published: (2024)

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
by: Kumar, Priyanshu, et al.
Published: (2024)

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
by: Hu, Xulin, et al.
Published: (2026)

NeST: Neuron Selective Tuning for LLM Safety
by: Behrouzi, Sasha, et al.
Published: (2026)

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
by: Lan, Wenhao, et al.
Published: (2026)

AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
by: Liang, Jiacheng, et al.
Published: (2025)

LLM Security and Safety: Insights from Homotopy-Inspired Prompt Obfuscation
by: Lazo, Luis, et al.
Published: (2026)

Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
by: Ding, Sihao
Published: (2026)

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
by: Liang, Jiacheng, et al.
Published: (2026)

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
by: Sheng, Leheng, et al.
Published: (2025)

Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges
by: Eiras, Francisco, et al.
Published: (2025)

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search
by: Luo, Zeren, et al.
Published: (2025)

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
by: Yang, Jinluan, et al.
Published: (2024)

N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator
by: Lin, Zheyu, et al.
Published: (2025)

Global Context Enhanced Anomaly Detection of Cyber Attacks via Decoupled Graph Neural Networks
by: Hafez, Ahmad
Published: (2024)

Watermark under Fire: A Robustness Evaluation of LLM Watermarking
by: Liang, Jiacheng, et al.
Published: (2024)

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment
by: Li, Yuxi, et al.
Published: (2024)

MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training
by: Li, Jiacheng, et al.
Published: (2023)

One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy
by: Zhang, Jiacheng, et al.
Published: (2025)

MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification
by: Luo, Xiang, et al.
Published: (2025)

Echoes within the Reasoning: Stealthy and Effective Watermarking via Chain of Thought
by: Lu, Jiacheng, et al.
Published: (2026)

FedReview: A Review Mechanism for Rejecting Poisoned Updates in Federated Learning
by: Zheng, Tianhang, et al.
Published: (2024)

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
by: Chen, Zhaorun, et al.
Published: (2025)

Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift
by: Yuan, Shuai, et al.
Published: (2025)

Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems
by: He, Jun, et al.
Published: (2026)

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
by: Zou, Wei, et al.
Published: (2025)

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs
by: Asif, Sadia, et al.
Published: (2026)

A Systematic Security Evaluation of OpenClaw and Its Variants
by: Wang, Yuhang, et al.
Published: (2026)

PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning
by: Yang, Xingke, et al.
Published: (2025)

DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy
by: Zhang, Qiuchen, et al.
Published: (2022)

Model Extraction Attacks Revisited
by: Liang, Jiacheng, et al.
Published: (2023)

LLM Fingerprinting via Semantically Conditioned Watermarks
by: Gloaguen, Thibaud, et al.
Published: (2025)

Improving LLM Safety Alignment with Dual-Objective Optimization
by: Zhao, Xuandong, et al.
Published: (2025)

Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution
by: Wu, Yixin, et al.
Published: (2024)

Passive Inference Attacks on Split Learning via Adversarial Regularization
by: Zhu, Xiaochen, et al.
Published: (2023)