:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Rivasseau, Thomas
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Cryptography and Security
Online Access:	https://arxiv.org/abs/2511.12782
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hash chaining degrades security at Facebook
by: Rivasseau, Thomas
Published: (2025)

TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026)

Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning
by: Hui, Zheng, et al.
Published: (2025)

ContextLeak: Auditing Leakage in Private In-Context Learning Methods
by: Choi, Jacob, et al.
Published: (2025)

Universal and Context-Independent Triggers for Precise Control of LLM Outputs
by: Liang, Jiashuo, et al.
Published: (2024)

Membership Inference Attacks Against In-Context Learning
by: Wen, Rui, et al.
Published: (2024)

ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
by: Si, Wai Man, et al.
Published: (2024)

Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts
by: Uenal, Fatih
Published: (2026)

MPMA: Preference Manipulation Attack Against Model Context Protocol
by: Wang, Zihan, et al.
Published: (2025)

CAPID: Context-Aware PII Detection for Question-Answering Systems
by: Ponomarenko, Mariia, et al.
Published: (2026)

Do Reasoning LLMs Refuse What They Infer in Long Contexts?
by: Fu, Yu, et al.
Published: (2026)

Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning
by: Wang, Yanbo, et al.
Published: (2026)

Federated In-Context LLM Agent Learning
by: Wu, Panlong, et al.
Published: (2024)

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking
by: Yoon, Sung-Hoon, et al.
Published: (2026)

Watermarking LLM Agent Trajectories
by: Meng, Wenlong, et al.
Published: (2026)

Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning
by: Chen, Chaoran, et al.
Published: (2026)

Proactive defense against LLM Jailbreak
by: Zhao, Weiliang, et al.
Published: (2025)

FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks
by: Chen, Bocheng, et al.
Published: (2024)

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
by: Wang, Yidan, et al.
Published: (2025)

LLM Anonymization Against Agentic Re-Identification
by: Li, Ziwen, et al.
Published: (2026)

LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
by: Fu, Jiyuan, et al.
Published: (2025)

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
by: Kim, Sangyeop, et al.
Published: (2025)

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents
by: Zhou, Zhenhong, et al.
Published: (2026)

Prompt Optimization and Evaluation for LLM Automated Red Teaming
by: Freenor, Michael, et al.
Published: (2025)

Multi-use LLM Watermarking and the False Detection Problem
by: Fu, Zihao, et al.
Published: (2025)

Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025)

FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework
by: Béjar, Mario Rodríguez, et al.
Published: (2026)

On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning
by: Ye, Xiaotian, et al.
Published: (2026)

Security Attacks on LLM-based Code Completion Tools
by: Cheng, Wen, et al.
Published: (2024)

GLiGuard: Schema-Conditioned Classification for LLM Safeguard
by: Zaratiana, Urchade, et al.
Published: (2026)

Confidential Prompting: Privacy-preserving LLM Inference on Cloud
by: Li, Caihua, et al.
Published: (2024)

WorldCup Sampling for Multi-bit LLM Watermarking
by: Wang, Yidan, et al.
Published: (2026)

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
by: Wang, Junlin, et al.
Published: (2024)

Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
by: Wu, Yu-Hang, et al.
Published: (2025)

Subversion via Focal Points: Investigating Collusion in LLM Monitoring
by: Järviniemi, Olli
Published: (2025)

CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
by: Fu, Wenjie, et al.
Published: (2026)

Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
by: Zeng, Xinyi, et al.
Published: (2024)

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection
by: Dang, Kieu, et al.
Published: (2026)

"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation
by: Liang, Zi, et al.
Published: (2024)

PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
by: Shen, Guobin, et al.
Published: (2025)