:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Hanxiu, Zheng, Yue
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2512.03620
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints
by: Gill, Waris, et al.
Published: (2025)

Instructional Fingerprinting of Large Language Models
by: Xu, Jiashu, et al.
Published: (2024)

Are Robust LLM Fingerprints Adversarially Robust?
by: Nasery, Anshul, et al.
Published: (2025)

A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens
by: Dobre, David, et al.
Published: (2025)

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training
by: Labiad, Ismail, et al.
Published: (2025)

Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025)

LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis
by: Alhazbi, Saeif, et al.
Published: (2025)

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
by: Zheng, Xiaosen, et al.
Published: (2024)

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
by: Wang, Kun, et al.
Published: (2025)

BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
by: Wang, Yifei, et al.
Published: (2024)

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis
by: Zhang, Haoyu, et al.
Published: (2026)

SVIP: Towards Verifiable Inference of Open-source Large Language Models
by: Sun, Yifan, et al.
Published: (2024)

Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach
by: Avice, Régnier, et al.
Published: (2025)

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
by: Wu, Tong, et al.
Published: (2024)

PostMark: A Robust Blackbox Watermark for Large Language Models
by: Chang, Yapei, et al.
Published: (2024)

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing
by: Li, Yuexin, et al.
Published: (2026)

From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
by: Xu, Xiaoyu, et al.
Published: (2026)

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening
by: Zhang, Mohan, et al.
Published: (2026)

Towards Understanding the Robustness of Sparse Autoencoders
by: Saiyed, Ahson, et al.
Published: (2026)

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents
by: Li, Jing-Jing, et al.
Published: (2025)

Federated In-Context LLM Agent Learning
by: Wu, Panlong, et al.
Published: (2024)

Exploring the Robustness of In-Context Learning with Noisy Labels
by: Cheng, Chen, et al.
Published: (2024)

Towards Building a Robust Toxicity Predictor
by: Bespalov, Dmitriy, et al.
Published: (2024)

On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)

AdvPrefix: An Objective for Nuanced LLM Jailbreaks
by: Zhu, Sicheng, et al.
Published: (2024)

Policy-Invisible Violations in LLM-Based Agents
by: Wu, Jie, et al.
Published: (2026)

Certifying LLM Safety against Adversarial Prompting
by: Kumar, Aounon, et al.
Published: (2023)

GaussMark: A Practical Approach for Structural Watermarking of Language Models
by: Block, Adam, et al.
Published: (2025)

Directional Embedding Smoothing for Robust Vision Language Models
by: Wang, Ye, et al.
Published: (2026)

Adversarial Text Purification: A Large Language Model Approach for Defense
by: Moraffah, Raha, et al.
Published: (2024)

HSF: Defending against Jailbreak Attacks with Hidden State Filtering
by: Qian, Cheng, et al.
Published: (2024)

Efficient LLM Moderation with Multi-Layer Latent Prototypes
by: Chrabąszcz, Maciej, et al.
Published: (2025)

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
by: Halawi, Danny, et al.
Published: (2024)

Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2025)

Probing the Robustness of Large Language Models Safety to Latent Perturbations
by: Gu, Tianle, et al.
Published: (2025)