Saved in:
| Main Authors: | Zhang, Yihao, Wang, Kai, Wu, Jiangrong, Wu, Haolin, Zhou, Yuxuan, Wei, Zeming, Wu, Dongxian, Chen, Xun, Sun, Jun, Sun, Meng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.11309 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems
by: Zhang, Yihao, et al.
Published: (2026)
by: Zhang, Yihao, et al.
Published: (2026)
MILE: A Mutation Testing Framework of In-Context Learning Systems
by: Wei, Zeming, et al.
Published: (2024)
by: Wei, Zeming, et al.
Published: (2024)
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
Secure LLM Fine-Tuning via Safety-Aware Probing
by: Wu, Chengcan, et al.
Published: (2025)
by: Wu, Chengcan, et al.
Published: (2025)
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing
by: Wei, Zeming, et al.
Published: (2026)
by: Wei, Zeming, et al.
Published: (2026)
Automata-Based Steering of Large Language Models for Diverse Structured Generation
by: Luan, Xiaokun, et al.
Published: (2025)
by: Luan, Xiaokun, et al.
Published: (2025)
Boosting Jailbreak Attack with Momentum
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
by: Wang, Kai, et al.
Published: (2025)
by: Wang, Kai, et al.
Published: (2025)
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
by: Zhang, Zhixin, et al.
Published: (2025)
by: Zhang, Zhixin, et al.
Published: (2025)
ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction
by: Wei, Zeming, et al.
Published: (2025)
by: Wei, Zeming, et al.
Published: (2025)
Control at Stake: Evaluating the Security Landscape of LLM-Driven Email Agents
by: Wu, Jiangrong, et al.
Published: (2025)
by: Wu, Jiangrong, et al.
Published: (2025)
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
by: Yang, Wenkai, et al.
Published: (2024)
by: Yang, Wenkai, et al.
Published: (2024)
ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation
by: Wu, Yiran, et al.
Published: (2025)
by: Wu, Yiran, et al.
Published: (2025)
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation
by: Jia, Xiaojun, et al.
Published: (2025)
by: Jia, Xiaojun, et al.
Published: (2025)
Conversations Risk Detection LLMs in Financial Agents via Multi-Stage Generative Rollout
by: Jiang, Xiaotong, et al.
Published: (2026)
by: Jiang, Xiaotong, et al.
Published: (2026)
Exploring the Robustness of In-Context Learning with Noisy Labels
by: Cheng, Chen, et al.
Published: (2024)
by: Cheng, Chen, et al.
Published: (2024)
ChainFuzzer: Greybox Fuzzing for Workflow-Level Multi-Tool Vulnerabilities in LLM Agents
by: Wu, Jiangrong, et al.
Published: (2026)
by: Wu, Jiangrong, et al.
Published: (2026)
Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis
by: Wang, Peiran, et al.
Published: (2026)
by: Wang, Peiran, et al.
Published: (2026)
Security Attacks on LLM-based Code Completion Tools
by: Cheng, Wen, et al.
Published: (2024)
by: Cheng, Wen, et al.
Published: (2024)
LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification
by: Tang, Fengxiao, et al.
Published: (2025)
by: Tang, Fengxiao, et al.
Published: (2025)
RoMA: Robust Malware Attribution via Byte-level Adversarial Training with Global Perturbations and Adversarial Consistency Regularization
by: Sun, Yuxia, et al.
Published: (2025)
by: Sun, Yuxia, et al.
Published: (2025)
From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions
by: Peng, Jiaren, et al.
Published: (2025)
by: Peng, Jiaren, et al.
Published: (2025)
Exploit the Leak: Understanding Risks in Biometric Matchers
by: Durbet, Axel, et al.
Published: (2023)
by: Durbet, Axel, et al.
Published: (2023)
Securing Multi-Agent Systems Against Corruptions via Node Contribution Backpropagation
by: Wu, Chengcan, et al.
Published: (2025)
by: Wu, Chengcan, et al.
Published: (2025)
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections
by: Chen, Chaoran, et al.
Published: (2025)
by: Chen, Chaoran, et al.
Published: (2025)
Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
by: Wu, Yu-Hang, et al.
Published: (2025)
by: Wu, Yu-Hang, et al.
Published: (2025)
Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion
by: Zhou, Yinghan, et al.
Published: (2025)
by: Zhou, Yinghan, et al.
Published: (2025)
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
by: Ren, Xiaoxue, et al.
Published: (2025)
by: Ren, Xiaoxue, et al.
Published: (2025)
Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?
by: Bhandarkar, Avanti, et al.
Published: (2024)
by: Bhandarkar, Avanti, et al.
Published: (2024)
Resource Consumption Threats in Large Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)
by: Zhang, Yuanhe, et al.
Published: (2026)
Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges
by: Ding, Ruomeng, et al.
Published: (2026)
by: Ding, Ruomeng, et al.
Published: (2026)
On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning
by: Ye, Xiaotian, et al.
Published: (2026)
by: Ye, Xiaotian, et al.
Published: (2026)
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
by: Wen, Rui, et al.
Published: (2026)
by: Wen, Rui, et al.
Published: (2026)
BraveGuard: From Open-World Threats to Safer Computer-Use Agents
by: Feng, Yunhao, et al.
Published: (2026)
by: Feng, Yunhao, et al.
Published: (2026)
From Perception to Protection: A Developer-Centered Study of Security and Privacy Threats in Extended Reality (XR)
by: Cai, Kunlin, et al.
Published: (2025)
by: Cai, Kunlin, et al.
Published: (2025)
Can LLM Infer Risk Information From MCP Server System Logs?
by: Fu, Jiayi, et al.
Published: (2025)
by: Fu, Jiayi, et al.
Published: (2025)
RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
by: Wei, Zeming, et al.
Published: (2026)
by: Wei, Zeming, et al.
Published: (2026)
Generalized Security-Preserving Refinement for Concurrent Systems
by: Sun, Huan, et al.
Published: (2025)
by: Sun, Huan, et al.
Published: (2025)
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
by: Piet, Julien, et al.
Published: (2023)
by: Piet, Julien, et al.
Published: (2023)
Similar Items
-
ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems
by: Zhang, Yihao, et al.
Published: (2026) -
MILE: A Mutation Testing Framework of In-Context Learning Systems
by: Wei, Zeming, et al.
Published: (2024) -
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024) -
Secure LLM Fine-Tuning via Safety-Aware Probing
by: Wu, Chengcan, et al.
Published: (2025) -
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing
by: Wei, Zeming, et al.
Published: (2026)