Saved in:
| Main Authors: | Zhao, Yilei, Zhang, Wentao, Xiao, Lei, Zheng, Yandan, Liu, Mengpu, Lim, Wei Yang Bryan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08676 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AgentOrchestra: Orchestrating Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
by: Zhang, Wentao, et al.
Published: (2025)
by: Zhang, Wentao, et al.
Published: (2025)
STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading
by: Zhao, Yilei, et al.
Published: (2024)
by: Zhao, Yilei, et al.
Published: (2024)
Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation
by: Wu, Zonghan, et al.
Published: (2025)
by: Wu, Zonghan, et al.
Published: (2025)
Agent Manufacturing: Foundation-Model Agents as First-Class Industrial Entities
by: Zhang, Yilei
Published: (2026)
by: Zhang, Yilei
Published: (2026)
EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
by: Zhang, Wentao, et al.
Published: (2026)
by: Zhang, Wentao, et al.
Published: (2026)
VoiceAgentEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Voice-Agent Evaluation of Xbench's Professional-Aligned Series
by: Xu, Pengyu, et al.
Published: (2025)
by: Xu, Pengyu, et al.
Published: (2025)
COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence
by: Li, Wentao, et al.
Published: (2025)
by: Li, Wentao, et al.
Published: (2025)
Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments
by: Jia, Zheng, et al.
Published: (2025)
by: Jia, Zheng, et al.
Published: (2025)
FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence
by: Yan, Xinyu, et al.
Published: (2026)
by: Yan, Xinyu, et al.
Published: (2026)
AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models
by: Zhang, Wentao, et al.
Published: (2026)
by: Zhang, Wentao, et al.
Published: (2026)
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
by: He, Chaoyue, et al.
Published: (2025)
by: He, Chaoyue, et al.
Published: (2025)
FinWorld: An All-in-One Open-Source Platform for End-to-End Financial AI Research and Deployment
by: Zhang, Wentao, et al.
Published: (2025)
by: Zhang, Wentao, et al.
Published: (2025)
Empowering Sustainable Finance with Artificial Intelligence: A Framework for Responsible Implementation
by: Pavlidis, Georgios
Published: (2025)
by: Pavlidis, Georgios
Published: (2025)
Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks
by: Meng, Xianhui, et al.
Published: (2025)
by: Meng, Xianhui, et al.
Published: (2025)
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
by: Lei, Fangyu, et al.
Published: (2025)
by: Lei, Fangyu, et al.
Published: (2025)
ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence
by: Ma, Menghe, et al.
Published: (2026)
by: Ma, Menghe, et al.
Published: (2026)
GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis
by: Yu, Bo, et al.
Published: (2026)
by: Yu, Bo, et al.
Published: (2026)
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)
by: Yuan, Jiakang, et al.
Published: (2025)
AI Agents for Sustainable SMEs: A Green ESG Assessment Framework
by: Trinh, Viet, et al.
Published: (2026)
by: Trinh, Viet, et al.
Published: (2026)
Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation
by: Lei, Yinjie, et al.
Published: (2023)
by: Lei, Yinjie, et al.
Published: (2023)
EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks
by: Yang, Xiao, et al.
Published: (2025)
by: Yang, Xiao, et al.
Published: (2025)
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist
by: Zhang, Wentao, et al.
Published: (2024)
by: Zhang, Wentao, et al.
Published: (2024)
DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis
by: Qi, Ruyi, et al.
Published: (2026)
by: Qi, Ruyi, et al.
Published: (2026)
ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
by: Sun, Siqi, et al.
Published: (2026)
by: Sun, Siqi, et al.
Published: (2026)
ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding
by: Dai, Xinbang, et al.
Published: (2025)
by: Dai, Xinbang, et al.
Published: (2025)
Integrating ESG and AI: A Comprehensive Responsible AI Assessment Framework
by: Lee, Sung Une, et al.
Published: (2024)
by: Lee, Sung Une, et al.
Published: (2024)
Data and System Perspectives of Sustainable Artificial Intelligence
by: Xie, Tao, et al.
Published: (2025)
by: Xie, Tao, et al.
Published: (2025)
ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices
by: Kong, Dezhi, et al.
Published: (2026)
by: Kong, Dezhi, et al.
Published: (2026)
SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents
by: Hu, Wentao, et al.
Published: (2026)
by: Hu, Wentao, et al.
Published: (2026)
TestAgent: An Adaptive and Intelligent Expert for Human Assessment
by: Yu, Junhao, et al.
Published: (2025)
by: Yu, Junhao, et al.
Published: (2025)
ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
by: Yang, Xinwei, et al.
Published: (2025)
by: Yang, Xinwei, et al.
Published: (2025)
XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition
by: Zhiren, Gong, et al.
Published: (2026)
by: Zhiren, Gong, et al.
Published: (2026)
FORTIS: Benchmarking Over-Privilege in Agent Skills
by: Li, Shawn, et al.
Published: (2026)
by: Li, Shawn, et al.
Published: (2026)
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
by: Chen, Jingxuan, et al.
Published: (2024)
by: Chen, Jingxuan, et al.
Published: (2024)
Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents
by: Li, Jiaxing, et al.
Published: (2026)
by: Li, Jiaxing, et al.
Published: (2026)
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment
by: Wu, Qinzhuo, et al.
Published: (2026)
by: Wu, Qinzhuo, et al.
Published: (2026)
Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing
by: Yang, Minglai, et al.
Published: (2026)
by: Yang, Minglai, et al.
Published: (2026)
Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
by: Dong, Haonan, et al.
Published: (2026)
by: Dong, Haonan, et al.
Published: (2026)
General-Purpose Aerial Intelligent Agents Empowered by Large Language Models
by: Zhao, Ji, et al.
Published: (2025)
by: Zhao, Ji, et al.
Published: (2025)
Bench-CoE: a Framework for Collaboration of Experts from Benchmark
by: Wang, Yuanshuai, et al.
Published: (2024)
by: Wang, Yuanshuai, et al.
Published: (2024)
Similar Items
-
AgentOrchestra: Orchestrating Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
by: Zhang, Wentao, et al.
Published: (2025) -
STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading
by: Zhao, Yilei, et al.
Published: (2024) -
Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation
by: Wu, Zonghan, et al.
Published: (2025) -
Agent Manufacturing: Foundation-Model Agents as First-Class Industrial Entities
by: Zhang, Yilei
Published: (2026) -
EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
by: Zhang, Wentao, et al.
Published: (2026)