:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Pengyu, Sun, Li, Yu, Philip S., Su, Sen
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.03238
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Unified Framework for the Evaluation of LLM Agentic Capabilities
by: Zhu, Pengyu, et al.
Published: (2026)

DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent
by: Zhu, Pengyu, et al.
Published: (2025)

Exploring the Necessity of Reasoning in LLM-based Agent Scenarios
by: Zhou, Xueyang, et al.
Published: (2025)

EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
by: Liu, Yi, et al.
Published: (2026)

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
by: Zhao, Hongjue, et al.
Published: (2026)

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
by: Hu, Chengwei, et al.
Published: (2024)

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents
by: Feng, Yunhao, et al.
Published: (2026)

LUMIR: an LLM-Driven Unified Agent Framework for Multi-task Infrared Spectroscopy Reasoning
by: Xie, Zujie, et al.
Published: (2025)

Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
by: Li, Yuran, et al.
Published: (2025)

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation
by: Yuan, Boqin, et al.
Published: (2026)

Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization
by: Zhu, Ying, et al.
Published: (2025)

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
by: Song, Yueqi, et al.
Published: (2025)

Unifying Invariant and Variant Features for Graph Out-of-Distribution via Probability of Necessity and Sufficiency
by: Chen, Xuexin, et al.
Published: (2024)

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
by: Gu, Yu, et al.
Published: (2024)

Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
by: Zhu, Jiachen, et al.
Published: (2025)

Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning
by: Chen, Boyu, et al.
Published: (2024)

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents
by: Liang, Yijuan, et al.
Published: (2026)

GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games
by: Li, Yuchen, et al.
Published: (2025)

Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents
by: Li, Yifei, et al.
Published: (2026)

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization
by: Li, Wenwu, et al.
Published: (2026)

The Explanation Necessity for Healthcare AI
by: Mamalakis, Michail, et al.
Published: (2024)

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
by: Su, Zhaochen, et al.
Published: (2024)

Towards Security-Auditable LLM Agents: A Unified Graph Representation
by: Li, Chaofan, et al.
Published: (2026)

Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification
by: Pandey, Himanshu, et al.
Published: (2024)

LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
by: Sun, Yuqiang, et al.
Published: (2024)

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use
by: Cheng, Yize, et al.
Published: (2026)

GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction
by: Zhang, Jian, et al.
Published: (2025)

MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs
by: Wang, Pengyu, et al.
Published: (2025)

ChainStream: An LLM-based Framework for Unified Synthetic Sensing
by: Liu, Jiacheng, et al.
Published: (2024)

LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
by: Zheng, Junhao, et al.
Published: (2025)

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
by: Ji, Zimo, et al.
Published: (2025)

A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions
by: Wu, Junchao, et al.
Published: (2023)

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
by: Wu, Yunze, et al.
Published: (2025)

STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics
by: Hui, Tingfeng, et al.
Published: (2026)

Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
by: Zhang, Xing, et al.
Published: (2026)

Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics
by: Zhong, Shaoxin, et al.
Published: (2026)

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks
by: Su, Shiqian, et al.
Published: (2026)

Interactional Fairness in LLM Multi-Agent Systems: An Evaluation Framework
by: Binkyte, Ruta
Published: (2025)

Chinese Court Simulation with LLM-Based Agent System
by: Zhang, Kaiyuan, et al.
Published: (2025)

AI Planning Framework for LLM-Based Web Agents
by: Shahnovsky, Orit, et al.
Published: (2026)