Saved in:
| Main Authors: | Zhu, Pengyu, Sun, Li, Yu, Philip S., Su, Sen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03238 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Unified Framework for the Evaluation of LLM Agentic Capabilities
by: Zhu, Pengyu, et al.
Published: (2026)
by: Zhu, Pengyu, et al.
Published: (2026)
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent
by: Zhu, Pengyu, et al.
Published: (2025)
by: Zhu, Pengyu, et al.
Published: (2025)
Exploring the Necessity of Reasoning in LLM-based Agent Scenarios
by: Zhou, Xueyang, et al.
Published: (2025)
by: Zhou, Xueyang, et al.
Published: (2025)
EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
by: Liu, Yi, et al.
Published: (2026)
by: Liu, Yi, et al.
Published: (2026)
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
by: Zhao, Hongjue, et al.
Published: (2026)
by: Zhao, Hongjue, et al.
Published: (2026)
WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
by: Hu, Chengwei, et al.
Published: (2024)
by: Hu, Chengwei, et al.
Published: (2024)
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents
by: Feng, Yunhao, et al.
Published: (2026)
by: Feng, Yunhao, et al.
Published: (2026)
LUMIR: an LLM-Driven Unified Agent Framework for Multi-task Infrared Spectroscopy Reasoning
by: Xie, Zujie, et al.
Published: (2025)
by: Xie, Zujie, et al.
Published: (2025)
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
by: Li, Yuran, et al.
Published: (2025)
by: Li, Yuran, et al.
Published: (2025)
ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation
by: Yuan, Boqin, et al.
Published: (2026)
by: Yuan, Boqin, et al.
Published: (2026)
Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization
by: Zhu, Ying, et al.
Published: (2025)
by: Zhu, Ying, et al.
Published: (2025)
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
by: Song, Yueqi, et al.
Published: (2025)
by: Song, Yueqi, et al.
Published: (2025)
Unifying Invariant and Variant Features for Graph Out-of-Distribution via Probability of Necessity and Sufficiency
by: Chen, Xuexin, et al.
Published: (2024)
by: Chen, Xuexin, et al.
Published: (2024)
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
by: Gu, Yu, et al.
Published: (2024)
by: Gu, Yu, et al.
Published: (2024)
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
by: Zhu, Jiachen, et al.
Published: (2025)
by: Zhu, Jiachen, et al.
Published: (2025)
Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning
by: Chen, Boyu, et al.
Published: (2024)
by: Chen, Boyu, et al.
Published: (2024)
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents
by: Liang, Yijuan, et al.
Published: (2026)
by: Liang, Yijuan, et al.
Published: (2026)
GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games
by: Li, Yuchen, et al.
Published: (2025)
by: Li, Yuchen, et al.
Published: (2025)
Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents
by: Li, Yifei, et al.
Published: (2026)
by: Li, Yifei, et al.
Published: (2026)
Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization
by: Li, Wenwu, et al.
Published: (2026)
by: Li, Wenwu, et al.
Published: (2026)
The Explanation Necessity for Healthcare AI
by: Mamalakis, Michail, et al.
Published: (2024)
by: Mamalakis, Michail, et al.
Published: (2024)
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
by: Su, Zhaochen, et al.
Published: (2024)
by: Su, Zhaochen, et al.
Published: (2024)
Towards Security-Auditable LLM Agents: A Unified Graph Representation
by: Li, Chaofan, et al.
Published: (2026)
by: Li, Chaofan, et al.
Published: (2026)
Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification
by: Pandey, Himanshu, et al.
Published: (2024)
by: Pandey, Himanshu, et al.
Published: (2024)
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
by: Sun, Yuqiang, et al.
Published: (2024)
by: Sun, Yuqiang, et al.
Published: (2024)
Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use
by: Cheng, Yize, et al.
Published: (2026)
by: Cheng, Yize, et al.
Published: (2026)
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction
by: Zhang, Jian, et al.
Published: (2025)
by: Zhang, Jian, et al.
Published: (2025)
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs
by: Wang, Pengyu, et al.
Published: (2025)
by: Wang, Pengyu, et al.
Published: (2025)
ChainStream: An LLM-based Framework for Unified Synthetic Sensing
by: Liu, Jiacheng, et al.
Published: (2024)
by: Liu, Jiacheng, et al.
Published: (2024)
LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
by: Zheng, Junhao, et al.
Published: (2025)
by: Zheng, Junhao, et al.
Published: (2025)
Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
by: Ji, Zimo, et al.
Published: (2025)
by: Ji, Zimo, et al.
Published: (2025)
A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions
by: Wu, Junchao, et al.
Published: (2023)
by: Wu, Junchao, et al.
Published: (2023)
InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
by: Wu, Yunze, et al.
Published: (2025)
by: Wu, Yunze, et al.
Published: (2025)
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics
by: Hui, Tingfeng, et al.
Published: (2026)
by: Hui, Tingfeng, et al.
Published: (2026)
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
by: Zhang, Xing, et al.
Published: (2026)
by: Zhang, Xing, et al.
Published: (2026)
Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics
by: Zhong, Shaoxin, et al.
Published: (2026)
by: Zhong, Shaoxin, et al.
Published: (2026)
MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks
by: Su, Shiqian, et al.
Published: (2026)
by: Su, Shiqian, et al.
Published: (2026)
Interactional Fairness in LLM Multi-Agent Systems: An Evaluation Framework
by: Binkyte, Ruta
Published: (2025)
by: Binkyte, Ruta
Published: (2025)
Chinese Court Simulation with LLM-Based Agent System
by: Zhang, Kaiyuan, et al.
Published: (2025)
by: Zhang, Kaiyuan, et al.
Published: (2025)
AI Planning Framework for LLM-Based Web Agents
by: Shahnovsky, Orit, et al.
Published: (2026)
by: Shahnovsky, Orit, et al.
Published: (2026)
Similar Items
-
A Unified Framework for the Evaluation of LLM Agentic Capabilities
by: Zhu, Pengyu, et al.
Published: (2026) -
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent
by: Zhu, Pengyu, et al.
Published: (2025) -
Exploring the Necessity of Reasoning in LLM-based Agent Scenarios
by: Zhou, Xueyang, et al.
Published: (2025) -
EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
by: Liu, Yi, et al.
Published: (2026) -
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
by: Zhao, Hongjue, et al.
Published: (2026)