Saved in:
| Main Authors: | Roh, Jaechul, Gandhi, Varun, Anilkumar, Shivani, Garg, Arin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.06971 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multilingual and Multi-Accent Jailbreaking of Audio LLMs
by: Roh, Jaechul, et al.
Published: (2025)
by: Roh, Jaechul, et al.
Published: (2025)
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026)
by: Roh, Jaechul, et al.
Published: (2026)
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
by: Naseh, Ali, et al.
Published: (2025)
by: Naseh, Ali, et al.
Published: (2025)
Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs
by: Sternfeld, Alexander, et al.
Published: (2026)
by: Sternfeld, Alexander, et al.
Published: (2026)
Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization
by: Liu, Houjun, et al.
Published: (2026)
by: Liu, Houjun, et al.
Published: (2026)
PurpCode: Reasoning for Safer Code Generation
by: Liu, Jiawei, et al.
Published: (2025)
by: Liu, Jiawei, et al.
Published: (2025)
Fingerprinting LLMs via Prompt Injection
by: Hu, Yuepeng, et al.
Published: (2025)
by: Hu, Yuepeng, et al.
Published: (2025)
FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
by: Roh, Jaechul, et al.
Published: (2024)
by: Roh, Jaechul, et al.
Published: (2024)
Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
by: Shukla, Shivani, et al.
Published: (2025)
by: Shukla, Shivani, et al.
Published: (2025)
OverThink: Slowdown Attacks on Reasoning LLMs
by: Kumar, Abhinav, et al.
Published: (2025)
by: Kumar, Abhinav, et al.
Published: (2025)
Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation
by: Mohseni, Seyedreza, et al.
Published: (2024)
by: Mohseni, Seyedreza, et al.
Published: (2024)
Efficient Provably Secure Linguistic Steganography via Range Coding
by: Yan, Ruiyi, et al.
Published: (2026)
by: Yan, Ruiyi, et al.
Published: (2026)
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
by: Li, Linbao, et al.
Published: (2025)
by: Li, Linbao, et al.
Published: (2025)
The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs
by: Berezin, Sergey, et al.
Published: (2025)
by: Berezin, Sergey, et al.
Published: (2025)
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
by: Paulus, Anselm, et al.
Published: (2024)
by: Paulus, Anselm, et al.
Published: (2024)
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
by: Xie, Yueqi, et al.
Published: (2024)
by: Xie, Yueqi, et al.
Published: (2024)
ProSec: Fortifying Code LLMs with Proactive Security Alignment
by: Xu, Xiangzhe, et al.
Published: (2024)
by: Xu, Xiangzhe, et al.
Published: (2024)
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
by: Qu, Yubin, et al.
Published: (2026)
by: Qu, Yubin, et al.
Published: (2026)
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
by: Wahed, Muntasir, et al.
Published: (2025)
by: Wahed, Muntasir, et al.
Published: (2025)
Enhancing Robustness of AI Offensive Code Generators via Data Augmentation
by: Improta, Cristina, et al.
Published: (2023)
by: Improta, Cristina, et al.
Published: (2023)
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs
by: Wang, Jiawen, et al.
Published: (2025)
by: Wang, Jiawen, et al.
Published: (2025)
LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs
by: Saha, Akashdeep, et al.
Published: (2025)
by: Saha, Akashdeep, et al.
Published: (2025)
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
by: Maloyan, Narek, et al.
Published: (2025)
by: Maloyan, Narek, et al.
Published: (2025)
Query-Based Adversarial Prompt Generation
by: Hayase, Jonathan, et al.
Published: (2024)
by: Hayase, Jonathan, et al.
Published: (2024)
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model
by: Wu, Tianyi, et al.
Published: (2026)
by: Wu, Tianyi, et al.
Published: (2026)
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
by: Li, Qizhang, et al.
Published: (2024)
by: Li, Qizhang, et al.
Published: (2024)
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)
by: Zhang, Chiyu, et al.
Published: (2025)
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)
by: Chen, Yunhao, et al.
Published: (2025)
SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts
by: Xin, Yuan, et al.
Published: (2026)
by: Xin, Yuan, et al.
Published: (2026)
From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security
by: Basic, Enna, et al.
Published: (2024)
by: Basic, Enna, et al.
Published: (2024)
CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants
by: Noah, Amit Finkman, et al.
Published: (2024)
by: Noah, Amit Finkman, et al.
Published: (2024)
TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity
by: Cao, Xi, et al.
Published: (2024)
by: Cao, Xi, et al.
Published: (2024)
Throttling Web Agents Using Reasoning Gates
by: Kumar, Abhinav, et al.
Published: (2025)
by: Kumar, Abhinav, et al.
Published: (2025)
Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning
by: Du, Xiaohu, et al.
Published: (2024)
by: Du, Xiaohu, et al.
Published: (2024)
Security Attacks on LLM-based Code Completion Tools
by: Cheng, Wen, et al.
Published: (2024)
by: Cheng, Wen, et al.
Published: (2024)
Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface
by: Labunets, Andrey, et al.
Published: (2025)
by: Labunets, Andrey, et al.
Published: (2025)
Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach
by: Zhang, Chong, et al.
Published: (2025)
by: Zhang, Chong, et al.
Published: (2025)
OSLO: One-Shot Label-Only Membership Inference Attacks
by: Peng, Yuefeng, et al.
Published: (2024)
by: Peng, Yuefeng, et al.
Published: (2024)
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
by: Zhu, Kaijie, et al.
Published: (2023)
by: Zhu, Kaijie, et al.
Published: (2023)
Similar Items
-
Multilingual and Multi-Accent Jailbreaking of Audio LLMs
by: Roh, Jaechul, et al.
Published: (2025) -
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026) -
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
by: Naseh, Ali, et al.
Published: (2025) -
Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs
by: Sternfeld, Alexander, et al.
Published: (2026) -
Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
by: Li, Xiang, et al.
Published: (2025)