Saved in:
| Main Authors: | Wang, Yuhui, Zhu, Rongyi, Wang, Ting |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.12186 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GraphRAG under Fire
by: Liang, Jiacheng, et al.
Published: (2025)
by: Liang, Jiacheng, et al.
Published: (2025)
RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
by: Liang, Jiacheng, et al.
Published: (2026)
by: Liang, Jiacheng, et al.
Published: (2026)
AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
by: Liang, Jiacheng, et al.
Published: (2025)
by: Liang, Jiacheng, et al.
Published: (2025)
ST-DPGAN: A Privacy-preserving Framework for Spatiotemporal Data Generation
by: Shao, Wei, et al.
Published: (2024)
by: Shao, Wei, et al.
Published: (2024)
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability
by: Kuikel, Shova, et al.
Published: (2025)
by: Kuikel, Shova, et al.
Published: (2025)
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
by: Wang, Zhaohui Geoffrey
Published: (2026)
by: Wang, Zhaohui Geoffrey
Published: (2026)
Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models
by: Wang, Jeffrey G., et al.
Published: (2024)
by: Wang, Jeffrey G., et al.
Published: (2024)
Jailbroken Frontier Models Retain Their Capabilities
by: Zhu, Daniel, et al.
Published: (2026)
by: Zhu, Daniel, et al.
Published: (2026)
Hijack Vertical Federated Learning Models As One Party
by: Qiu, Pengyu, et al.
Published: (2022)
by: Qiu, Pengyu, et al.
Published: (2022)
Reconstruction of Differentially Private Text Sanitization via Large Language Models
by: Pang, Shuchao, et al.
Published: (2024)
by: Pang, Shuchao, et al.
Published: (2024)
PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
by: Ertan, Murat Bilgehan, et al.
Published: (2026)
by: Ertan, Murat Bilgehan, et al.
Published: (2026)
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
by: Rashid, Md Rafi Ur, et al.
Published: (2024)
by: Rashid, Md Rafi Ur, et al.
Published: (2024)
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models
by: Peng, Benji, et al.
Published: (2024)
by: Peng, Benji, et al.
Published: (2024)
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
by: Qiu, Le, et al.
Published: (2025)
by: Qiu, Le, et al.
Published: (2025)
Your Agent Can Defend Itself against Backdoor Attacks
by: Changjiang, Li, et al.
Published: (2025)
by: Changjiang, Li, et al.
Published: (2025)
PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
by: Liu, Guozhi, et al.
Published: (2025)
by: Liu, Guozhi, et al.
Published: (2025)
Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation
by: Desu, Aditya, et al.
Published: (2024)
by: Desu, Aditya, et al.
Published: (2024)
OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences
by: Wang, Kaixiang, et al.
Published: (2026)
by: Wang, Kaixiang, et al.
Published: (2026)
The Steganographic Potentials of Language Models
by: Karpov, Artem, et al.
Published: (2025)
by: Karpov, Artem, et al.
Published: (2025)
Watermarking Diffusion Language Models
by: Gloaguen, Thibaud, et al.
Published: (2025)
by: Gloaguen, Thibaud, et al.
Published: (2025)
Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework
by: Chen, Shengchao, et al.
Published: (2024)
by: Chen, Shengchao, et al.
Published: (2024)
Model-based Large Language Model Customization as Service
by: Wu, Zhaomin, et al.
Published: (2024)
by: Wu, Zhaomin, et al.
Published: (2024)
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models
by: Fang, Junfeng, et al.
Published: (2025)
by: Fang, Junfeng, et al.
Published: (2025)
Privacy Auditing of Large Language Models
by: Panda, Ashwinee, et al.
Published: (2025)
by: Panda, Ashwinee, et al.
Published: (2025)
Watermark Stealing in Large Language Models
by: Jovanović, Nikola, et al.
Published: (2024)
by: Jovanović, Nikola, et al.
Published: (2024)
Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses
by: Yichao, Wu, et al.
Published: (2025)
by: Yichao, Wu, et al.
Published: (2025)
PROPS: Progressively Private Self-alignment of Large Language Models
by: Teku, Noel, et al.
Published: (2025)
by: Teku, Noel, et al.
Published: (2025)
Exploring the Secondary Risks of Large Language Models
by: Chen, Jiawei, et al.
Published: (2025)
by: Chen, Jiawei, et al.
Published: (2025)
Discovering Spoofing Attempts on Language Model Watermarks
by: Gloaguen, Thibaud, et al.
Published: (2024)
by: Gloaguen, Thibaud, et al.
Published: (2024)
Finetuning Large Language Models for Vulnerability Detection
by: Shestov, Alexey, et al.
Published: (2024)
by: Shestov, Alexey, et al.
Published: (2024)
A Survey on Model Extraction Attacks and Defenses for Large Language Models
by: Zhao, Kaixiang, et al.
Published: (2025)
by: Zhao, Kaixiang, et al.
Published: (2025)
Rounding-Guided Backdoor Injection in Deep Learning Model Quantization
by: Chen, Xiangxiang, et al.
Published: (2025)
by: Chen, Xiangxiang, et al.
Published: (2025)
Improved Algorithms for Differentially Private Language Model Alignment
by: Chen, Keyu, et al.
Published: (2025)
by: Chen, Keyu, et al.
Published: (2025)
Adaptive PII Mitigation Framework for Large Language Models
by: Asthana, Shubhi, et al.
Published: (2025)
by: Asthana, Shubhi, et al.
Published: (2025)
Large Language Models Are Unreliable for Cyber Threat Intelligence
by: Mezzi, Emanuele, et al.
Published: (2025)
by: Mezzi, Emanuele, et al.
Published: (2025)
Efficient Decoding Methods for Language Models on Encrypted Data
by: Avitan, Matan, et al.
Published: (2025)
by: Avitan, Matan, et al.
Published: (2025)
Prompt Injection Attacks on Large Language Models in Oncology
by: Clusmann, Jan, et al.
Published: (2024)
by: Clusmann, Jan, et al.
Published: (2024)
In-Context Unlearning: Language Models as Few Shot Unlearners
by: Pawelczyk, Martin, et al.
Published: (2023)
by: Pawelczyk, Martin, et al.
Published: (2023)
Towards Characterizing Cyber Networks with Large Language Models
by: Hartsock, Alaric, et al.
Published: (2024)
by: Hartsock, Alaric, et al.
Published: (2024)
Similar Items
-
GraphRAG under Fire
by: Liang, Jiacheng, et al.
Published: (2025) -
RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
by: Liang, Jiacheng, et al.
Published: (2026) -
AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
by: Liang, Jiacheng, et al.
Published: (2025) -
ST-DPGAN: A Privacy-preserving Framework for Spatiotemporal Data Generation
by: Shao, Wei, et al.
Published: (2024) -
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability
by: Kuikel, Shova, et al.
Published: (2025)