Saved in:
| Main Authors: | Li, Xuying, Li, Zhuo, Kosuga, Yuji, Yoshida, Yasuhiro, Bian, Victor |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.04415 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Precision Knowledge Editing: Enhancing Safety in Large Language Models
by: Li, Xuying, et al.
Published: (2024)
by: Li, Xuying, et al.
Published: (2024)
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
by: Li, Xuying, et al.
Published: (2025)
by: Li, Xuying, et al.
Published: (2025)
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025)
by: Shu, Huizhen, et al.
Published: (2025)
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach
by: Li, Xuying, et al.
Published: (2025)
by: Li, Xuying, et al.
Published: (2025)
LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation
by: Shu, Huizhen, et al.
Published: (2025)
by: Shu, Huizhen, et al.
Published: (2025)
The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)
by: Tan, Yuting, et al.
Published: (2025)
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
by: Li, Ang, et al.
Published: (2025)
by: Li, Ang, et al.
Published: (2025)
Targeted Bit-Flip Attacks on LLM-Based Agents
by: Wang, Jialai, et al.
Published: (2026)
by: Wang, Jialai, et al.
Published: (2026)
Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors
by: LI, Xuying
Published: (2025)
by: LI, Xuying
Published: (2025)
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
by: Lin, Xixun, et al.
Published: (2025)
by: Lin, Xixun, et al.
Published: (2025)
E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness
by: Zhao, Yibo, et al.
Published: (2025)
by: Zhao, Yibo, et al.
Published: (2025)
CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent
by: Ning, Liang-bo, et al.
Published: (2025)
by: Ning, Liang-bo, et al.
Published: (2025)
FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents
by: Li, Qizheng, et al.
Published: (2026)
by: Li, Qizheng, et al.
Published: (2026)
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
by: Li, Yige, et al.
Published: (2025)
by: Li, Yige, et al.
Published: (2025)
Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
by: Johnson, Sam, et al.
Published: (2025)
by: Johnson, Sam, et al.
Published: (2025)
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents
by: Feng, Yunhao, et al.
Published: (2026)
by: Feng, Yunhao, et al.
Published: (2026)
IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems
by: Wang, Liwen, et al.
Published: (2025)
by: Wang, Liwen, et al.
Published: (2025)
Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?
by: Chen, Wanyi, et al.
Published: (2026)
by: Chen, Wanyi, et al.
Published: (2026)
Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
by: Kang, Daewon, et al.
Published: (2025)
by: Kang, Daewon, et al.
Published: (2025)
S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
by: Liu, Yongxiang, et al.
Published: (2024)
by: Liu, Yongxiang, et al.
Published: (2024)
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
by: Ikbarieh, Seif, et al.
Published: (2025)
by: Ikbarieh, Seif, et al.
Published: (2025)
SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack
by: Liu, Han, et al.
Published: (2026)
by: Liu, Han, et al.
Published: (2026)
E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems
by: Guan, Zelin, et al.
Published: (2026)
by: Guan, Zelin, et al.
Published: (2026)
Micro-Act: Mitigating Knowledge Conflict in LLM-based RAG via Actionable Self-Reasoning
by: Huo, Nan, et al.
Published: (2025)
by: Huo, Nan, et al.
Published: (2025)
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
by: Mo, Kanghua, et al.
Published: (2025)
by: Mo, Kanghua, et al.
Published: (2025)
EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness
by: Zhang, Yunxiao, et al.
Published: (2025)
by: Zhang, Yunxiao, et al.
Published: (2025)
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG
by: Du, Xueying, et al.
Published: (2024)
by: Du, Xueying, et al.
Published: (2024)
Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate
by: Qi, Senmao, et al.
Published: (2025)
by: Qi, Senmao, et al.
Published: (2025)
A Concurrent Modular Agent: Framework for Autonomous LLM Agents
by: Maruyama, Norihiro, et al.
Published: (2025)
by: Maruyama, Norihiro, et al.
Published: (2025)
An Iterative LLM Framework for SIBT utilizing RAG-based Adaptive Weight Optimization
by: Xiao, Zhuo, et al.
Published: (2025)
by: Xiao, Zhuo, et al.
Published: (2025)
Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts
by: Wang, Boxuan, et al.
Published: (2026)
by: Wang, Boxuan, et al.
Published: (2026)
A Simple and Effective Method for Uncertainty Quantification and OOD Detection
by: Ma, Yaxin, et al.
Published: (2025)
by: Ma, Yaxin, et al.
Published: (2025)
ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification
by: Lin, Xiao, et al.
Published: (2026)
by: Lin, Xiao, et al.
Published: (2026)
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
by: Shen, Zeyu, et al.
Published: (2025)
by: Shen, Zeyu, et al.
Published: (2025)
SimpleMem: Efficient Lifelong Memory for LLM Agents
by: Liu, Jiaqi, et al.
Published: (2026)
by: Liu, Jiaqi, et al.
Published: (2026)
Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models
by: Luo, Jiayi, et al.
Published: (2025)
by: Luo, Jiayi, et al.
Published: (2025)
AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration
by: Tian, Chunhao, et al.
Published: (2025)
by: Tian, Chunhao, et al.
Published: (2025)
Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey
by: Guan, Shengyue, et al.
Published: (2025)
by: Guan, Shengyue, et al.
Published: (2025)
RAG-Enhanced Collaborative LLM Agents for Drug Discovery
by: Lee, Namkyeong, et al.
Published: (2025)
by: Lee, Namkyeong, et al.
Published: (2025)
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation
by: Ye, Ziang, et al.
Published: (2025)
by: Ye, Ziang, et al.
Published: (2025)
Similar Items
-
Precision Knowledge Editing: Enhancing Safety in Large Language Models
by: Li, Xuying, et al.
Published: (2024) -
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
by: Li, Xuying, et al.
Published: (2025) -
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025) -
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach
by: Li, Xuying, et al.
Published: (2025) -
LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation
by: Shu, Huizhen, et al.
Published: (2025)