Saved in:
| Main Author: | Rivasseau, Thomas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.12782 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hash chaining degrades security at Facebook
by: Rivasseau, Thomas
Published: (2025)
by: Rivasseau, Thomas
Published: (2025)
TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026)
by: Chu, Hua-Rong, et al.
Published: (2026)
Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning
by: Hui, Zheng, et al.
Published: (2025)
by: Hui, Zheng, et al.
Published: (2025)
ContextLeak: Auditing Leakage in Private In-Context Learning Methods
by: Choi, Jacob, et al.
Published: (2025)
by: Choi, Jacob, et al.
Published: (2025)
Universal and Context-Independent Triggers for Precise Control of LLM Outputs
by: Liang, Jiashuo, et al.
Published: (2024)
by: Liang, Jiashuo, et al.
Published: (2024)
Membership Inference Attacks Against In-Context Learning
by: Wen, Rui, et al.
Published: (2024)
by: Wen, Rui, et al.
Published: (2024)
ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
by: Si, Wai Man, et al.
Published: (2024)
by: Si, Wai Man, et al.
Published: (2024)
Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts
by: Uenal, Fatih
Published: (2026)
by: Uenal, Fatih
Published: (2026)
MPMA: Preference Manipulation Attack Against Model Context Protocol
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
CAPID: Context-Aware PII Detection for Question-Answering Systems
by: Ponomarenko, Mariia, et al.
Published: (2026)
by: Ponomarenko, Mariia, et al.
Published: (2026)
Do Reasoning LLMs Refuse What They Infer in Long Contexts?
by: Fu, Yu, et al.
Published: (2026)
by: Fu, Yu, et al.
Published: (2026)
Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning
by: Wang, Yanbo, et al.
Published: (2026)
by: Wang, Yanbo, et al.
Published: (2026)
Federated In-Context LLM Agent Learning
by: Wu, Panlong, et al.
Published: (2024)
by: Wu, Panlong, et al.
Published: (2024)
TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking
by: Yoon, Sung-Hoon, et al.
Published: (2026)
by: Yoon, Sung-Hoon, et al.
Published: (2026)
Watermarking LLM Agent Trajectories
by: Meng, Wenlong, et al.
Published: (2026)
by: Meng, Wenlong, et al.
Published: (2026)
Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning
by: Chen, Chaoran, et al.
Published: (2026)
by: Chen, Chaoran, et al.
Published: (2026)
Proactive defense against LLM Jailbreak
by: Zhao, Weiliang, et al.
Published: (2025)
by: Zhao, Weiliang, et al.
Published: (2025)
FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks
by: Chen, Bocheng, et al.
Published: (2024)
by: Chen, Bocheng, et al.
Published: (2024)
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
by: Wang, Yidan, et al.
Published: (2025)
by: Wang, Yidan, et al.
Published: (2025)
LLM Anonymization Against Agentic Re-Identification
by: Li, Ziwen, et al.
Published: (2026)
by: Li, Ziwen, et al.
Published: (2026)
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
by: Fu, Jiyuan, et al.
Published: (2025)
by: Fu, Jiyuan, et al.
Published: (2025)
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
by: Kim, Sangyeop, et al.
Published: (2025)
by: Kim, Sangyeop, et al.
Published: (2025)
MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents
by: Zhou, Zhenhong, et al.
Published: (2026)
by: Zhou, Zhenhong, et al.
Published: (2026)
Prompt Optimization and Evaluation for LLM Automated Red Teaming
by: Freenor, Michael, et al.
Published: (2025)
by: Freenor, Michael, et al.
Published: (2025)
Multi-use LLM Watermarking and the False Detection Problem
by: Fu, Zihao, et al.
Published: (2025)
by: Fu, Zihao, et al.
Published: (2025)
Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025)
by: He, Zeqing, et al.
Published: (2025)
FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework
by: Béjar, Mario Rodríguez, et al.
Published: (2026)
by: Béjar, Mario Rodríguez, et al.
Published: (2026)
On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning
by: Ye, Xiaotian, et al.
Published: (2026)
by: Ye, Xiaotian, et al.
Published: (2026)
Security Attacks on LLM-based Code Completion Tools
by: Cheng, Wen, et al.
Published: (2024)
by: Cheng, Wen, et al.
Published: (2024)
GLiGuard: Schema-Conditioned Classification for LLM Safeguard
by: Zaratiana, Urchade, et al.
Published: (2026)
by: Zaratiana, Urchade, et al.
Published: (2026)
Confidential Prompting: Privacy-preserving LLM Inference on Cloud
by: Li, Caihua, et al.
Published: (2024)
by: Li, Caihua, et al.
Published: (2024)
WorldCup Sampling for Multi-bit LLM Watermarking
by: Wang, Yidan, et al.
Published: (2026)
by: Wang, Yidan, et al.
Published: (2026)
Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
by: Wang, Junlin, et al.
Published: (2024)
by: Wang, Junlin, et al.
Published: (2024)
Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
by: Wu, Yu-Hang, et al.
Published: (2025)
by: Wu, Yu-Hang, et al.
Published: (2025)
Subversion via Focal Points: Investigating Collusion in LLM Monitoring
by: Järviniemi, Olli
Published: (2025)
by: Järviniemi, Olli
Published: (2025)
CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
by: Fu, Wenjie, et al.
Published: (2026)
by: Fu, Wenjie, et al.
Published: (2026)
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
by: Zeng, Xinyi, et al.
Published: (2024)
by: Zeng, Xinyi, et al.
Published: (2024)
Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection
by: Dang, Kieu, et al.
Published: (2026)
by: Dang, Kieu, et al.
Published: (2026)
"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation
by: Liang, Zi, et al.
Published: (2024)
by: Liang, Zi, et al.
Published: (2024)
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
by: Shen, Guobin, et al.
Published: (2025)
by: Shen, Guobin, et al.
Published: (2025)
Similar Items
-
Hash chaining degrades security at Facebook
by: Rivasseau, Thomas
Published: (2025) -
TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026) -
Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning
by: Hui, Zheng, et al.
Published: (2025) -
ContextLeak: Auditing Leakage in Private In-Context Learning Methods
by: Choi, Jacob, et al.
Published: (2025) -
Universal and Context-Independent Triggers for Precise Control of LLM Outputs
by: Liang, Jiashuo, et al.
Published: (2024)