:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ji, Zhenlan, Wu, Daoyuan, Ma, Pingchuan, Li, Zongjie, Wang, Shuai
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Programming Languages
Online Access:	https://arxiv.org/abs/2404.17833
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems
by: Wang, Liwen, et al.
Published: (2025)

Split and Merge: Aligning Position Biases in LLM-based Evaluators
by: Li, Zongjie, et al.
Published: (2023)

Digging Into the Internal: Causality-Based Analysis of LLM Function Calling
by: Ji, Zhenlan, et al.
Published: (2025)

Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges
by: Ji, Zimo, et al.
Published: (2025)

SoK: Evaluating Jailbreak Guardrails for Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)

SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
by: Wang, Xunguang, et al.
Published: (2024)

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
by: Ji, Zimo, et al.
Published: (2025)

Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
by: Wang, Xunguang, et al.
Published: (2026)

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios
by: Li, Zongjie, et al.
Published: (2024)

Evaluating LLMs on Sequential API Call Through Automated Test Generation
by: Huang, Yuheng, et al.
Published: (2025)

API-guided Dataset Synthesis to Finetune Large Code Models
by: Li, Zongjie, et al.
Published: (2024)

Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs
by: Li, Zongjie, et al.
Published: (2025)

Understanding and Bridging the Planner-Coder Gap: A Systematic Study on the Robustness of Multi-Agent Systems for Code Generation
by: Lyu, Zongyi, et al.
Published: (2025)

Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators
by: Zhang, Kunpeng, et al.
Published: (2025)

EAMET: Robust Massive Model Editing via Embedding Alignment Optimization
by: Dai, Yanbo, et al.
Published: (2025)

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
by: Wang, Xunguang, et al.
Published: (2023)

WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-Making
by: Li, Zongjie, et al.
Published: (2026)

Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning
by: Dai, Yanbo, et al.
Published: (2025)

GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
by: Huang, Ruixuan, et al.
Published: (2025)

Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators
by: Sun, Maolin, et al.
Published: (2025)

Provable Coordination for LLM Agents via Message Sequence Charts
by: Bollig, Benedikt, et al.
Published: (2026)

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework
by: Ji, Zimo, et al.
Published: (2026)

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
by: Wei, Anjiang, et al.
Published: (2025)

An LLM-powered Natural-to-Robotic Language Translation Framework with Correctness Guarantees
by: Chen, ZhenDong, et al.
Published: (2025)

Benchmark Test-Time Scaling of General LLM Agents
by: Li, Xiaochuan, et al.
Published: (2026)

OBsmith: LLM-Powered JavaScript Obfuscator Testing
by: Jiang, Shan, et al.
Published: (2025)

AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents
by: Xu, Shuyuan, et al.
Published: (2024)

Generating Pragmatic Examples to Train Neural Program Synthesizers
by: Vaduguru, Saujas, et al.
Published: (2023)

AutoPDL: Automatic Prompt Optimization for LLM Agents
by: Spiess, Claudio, et al.
Published: (2025)

SEAL: Subspace-Anchored Watermarks for LLM Ownership
by: Dai, Yanbo, et al.
Published: (2025)

LACUNA: Safe Agents as Recursive Program Holes
by: Zhao, Yaoyu, et al.
Published: (2026)

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
by: Zhang, Zehua, et al.
Published: (2025)

Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation
by: Niu, Cheng, et al.
Published: (2024)

Inference Plans for Hybrid Particle Filtering
by: Cheng, Ellie Y., et al.
Published: (2024)

A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows
by: Daunis, Ivan
Published: (2025)

Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
by: Tang, Shuo, et al.
Published: (2024)

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
by: Liu, Max, et al.
Published: (2024)

CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
by: Zhao, Yang, et al.
Published: (2024)

Enforcing Temporal Constraints for LLM Agents
by: Kamath, Adharsh, et al.
Published: (2025)