Saved in:
| Main Authors: | Xu, Shihao, Zhou, Tiancheng, Ma, Jiatong, Ding, Yanli, Yan, Yiming, Xiao, Ming, Li, Guoyi, Geng, Haiyang, Han, Yunyun, Chen, Jianhua, Deng, Yafeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09379 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation
by: Li, Guoyi, et al.
Published: (2026)
by: Li, Guoyi, et al.
Published: (2026)
Empowering Medical Multi-Agents with Clinical Consultation Flow for Dynamic Diagnosis
by: Wang, Sihan, et al.
Published: (2025)
by: Wang, Sihan, et al.
Published: (2025)
LLM-ABM for Transportation: Assessing the Potential of LLM Agents in System Analysis
by: Liu, Tianming, et al.
Published: (2025)
by: Liu, Tianming, et al.
Published: (2025)
MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus
by: Li, Zheng, et al.
Published: (2026)
by: Li, Zheng, et al.
Published: (2026)
AgentWebBench: Benchmarking Multi-Agent Coordination in Agentic Web
by: Zhong, Shanshan, et al.
Published: (2026)
by: Zhong, Shanshan, et al.
Published: (2026)
Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework
by: Liu, Tianming, et al.
Published: (2024)
by: Liu, Tianming, et al.
Published: (2024)
MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents
by: Jiang, Yixing, et al.
Published: (2025)
by: Jiang, Yixing, et al.
Published: (2025)
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
by: Feng, Yukang, et al.
Published: (2026)
by: Feng, Yukang, et al.
Published: (2026)
Benchmarking LLMs' Swarm intelligence
by: Ruan, Kai, et al.
Published: (2025)
by: Ruan, Kai, et al.
Published: (2025)
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers
by: Xiang, Yanzheng, et al.
Published: (2025)
by: Xiang, Yanzheng, et al.
Published: (2025)
Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing
by: Liu, Pengju, et al.
Published: (2026)
by: Liu, Pengju, et al.
Published: (2026)
CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs
by: Zou, Chelsea, et al.
Published: (2026)
by: Zou, Chelsea, et al.
Published: (2026)
Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models
by: Lin, Cooper, et al.
Published: (2026)
by: Lin, Cooper, et al.
Published: (2026)
Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations
by: Han, Xudong, et al.
Published: (2025)
by: Han, Xudong, et al.
Published: (2025)
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
by: Lai, Eugenie, et al.
Published: (2025)
by: Lai, Eugenie, et al.
Published: (2025)
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
by: Bettini, Matteo, et al.
Published: (2023)
by: Bettini, Matteo, et al.
Published: (2023)
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
by: Patel, Dhaval, et al.
Published: (2025)
by: Patel, Dhaval, et al.
Published: (2025)
AgentSearchBench: A Benchmark for AI Agent Search in the Wild
by: Wu, Bin, et al.
Published: (2026)
by: Wu, Bin, et al.
Published: (2026)
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
by: Styles, Olly, et al.
Published: (2024)
by: Styles, Olly, et al.
Published: (2024)
Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?
by: Yuan, Grace Chang, et al.
Published: (2026)
by: Yuan, Grace Chang, et al.
Published: (2026)
MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering
by: Guan, Shaowei, et al.
Published: (2026)
by: Guan, Shaowei, et al.
Published: (2026)
How Real Is AI Tutoring? Comparing Simulated and Human Dialogues in One-on-One Instruction
by: Li, Ruijia, et al.
Published: (2025)
by: Li, Ruijia, et al.
Published: (2025)
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
by: Siegel, Zachary S., et al.
Published: (2024)
by: Siegel, Zachary S., et al.
Published: (2024)
ALAS: Transactional and Dynamic Multi-Agent LLM Planning
by: Geng, Longling, et al.
Published: (2025)
by: Geng, Longling, et al.
Published: (2025)
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
by: Tihanyi, Norbert, et al.
Published: (2024)
by: Tihanyi, Norbert, et al.
Published: (2024)
SpecBench: Evaluating Specification-Level Reasoning for Software Engineering LLM Agents
by: Hamblin, Grant, et al.
Published: (2026)
by: Hamblin, Grant, et al.
Published: (2026)
Evaluating Multi-Agent LLM Architectures for Rare Disease Diagnosis
by: Almasoud, Ahmed
Published: (2026)
by: Almasoud, Ahmed
Published: (2026)
SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis
by: Arshad, Muhammad Arbab, et al.
Published: (2026)
by: Arshad, Muhammad Arbab, et al.
Published: (2026)
Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
by: Magnino, Lorenzo, et al.
Published: (2026)
by: Magnino, Lorenzo, et al.
Published: (2026)
DDO: Dual-Decision Optimization for LLM-Based Medical Consultation via Multi-Agent Collaboration
by: Jia, Zhihao, et al.
Published: (2025)
by: Jia, Zhihao, et al.
Published: (2025)
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
by: Zhao, Yujie, et al.
Published: (2025)
by: Zhao, Yujie, et al.
Published: (2025)
Multi-Agent Reinforcement Learning for Multi-Cell Spectrum and Power Allocation
by: Zhang, Yiming, et al.
Published: (2023)
by: Zhang, Yiming, et al.
Published: (2023)
Second Order Statistics Analysis and Comparison between Arithmetic and Geometric Average Fusion
by: Li, Tiancheng, et al.
Published: (2019)
by: Li, Tiancheng, et al.
Published: (2019)
Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning
by: Yao, Junchi, et al.
Published: (2025)
by: Yao, Junchi, et al.
Published: (2025)
Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs
by: Mao, Yuren, et al.
Published: (2025)
by: Mao, Yuren, et al.
Published: (2025)
Self-Organizing Agent Network for LLM-based Workflow Automation
by: Xiong, Yiming, et al.
Published: (2025)
by: Xiong, Yiming, et al.
Published: (2025)
Collaborative QA using Interacting LLMs. Impact of Network Structure, Node Capability and Distributed Data
by: Jain, Adit, et al.
Published: (2025)
by: Jain, Adit, et al.
Published: (2025)
Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models
by: Liu, Ziqi, et al.
Published: (2025)
by: Liu, Ziqi, et al.
Published: (2025)
Mapis: A Knowledge-Graph Grounded Multi-Agent Framework for Evidence-Based PCOS Diagnosis
by: He, Zanxiang, et al.
Published: (2025)
by: He, Zanxiang, et al.
Published: (2025)
PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization
by: Xiang, Dawei, et al.
Published: (2025)
by: Xiang, Dawei, et al.
Published: (2025)
Similar Items
-
MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation
by: Li, Guoyi, et al.
Published: (2026) -
Empowering Medical Multi-Agents with Clinical Consultation Flow for Dynamic Diagnosis
by: Wang, Sihan, et al.
Published: (2025) -
LLM-ABM for Transportation: Assessing the Potential of LLM Agents in System Analysis
by: Liu, Tianming, et al.
Published: (2025) -
MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus
by: Li, Zheng, et al.
Published: (2026) -
AgentWebBench: Benchmarking Multi-Agent Coordination in Agentic Web
by: Zhong, Shanshan, et al.
Published: (2026)