:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yilei, Zhang, Wentao, Xiao, Lei, Zheng, Yandan, Liu, Mengpu, Lim, Wei Yang Bryan
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.08676
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AgentOrchestra: Orchestrating Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
by: Zhang, Wentao, et al.
Published: (2025)

STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading
by: Zhao, Yilei, et al.
Published: (2024)

Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation
by: Wu, Zonghan, et al.
Published: (2025)

Agent Manufacturing: Foundation-Model Agents as First-Class Industrial Entities
by: Zhang, Yilei
Published: (2026)

EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
by: Zhang, Wentao, et al.
Published: (2026)

VoiceAgentEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Voice-Agent Evaluation of Xbench's Professional-Aligned Series
by: Xu, Pengyu, et al.
Published: (2025)

COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence
by: Li, Wentao, et al.
Published: (2025)

Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments
by: Jia, Zheng, et al.
Published: (2025)

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence
by: Yan, Xinyu, et al.
Published: (2026)

AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models
by: Zhang, Wentao, et al.
Published: (2026)

ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
by: He, Chaoyue, et al.
Published: (2025)

FinWorld: An All-in-One Open-Source Platform for End-to-End Financial AI Research and Deployment
by: Zhang, Wentao, et al.
Published: (2025)

Empowering Sustainable Finance with Artificial Intelligence: A Framework for Responsible Implementation
by: Pavlidis, Georgios
Published: (2025)

Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks
by: Meng, Xianhui, et al.
Published: (2025)

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
by: Lei, Fangyu, et al.
Published: (2025)

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence
by: Ma, Menghe, et al.
Published: (2026)

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis
by: Yu, Bo, et al.
Published: (2026)

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)

AI Agents for Sustainable SMEs: A Green ESG Assessment Framework
by: Trinh, Viet, et al.
Published: (2026)

Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation
by: Lei, Yinjie, et al.
Published: (2023)

EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks
by: Yang, Xiao, et al.
Published: (2025)

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist
by: Zhang, Wentao, et al.
Published: (2024)

DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis
by: Qi, Ruyi, et al.
Published: (2026)

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
by: Sun, Siqi, et al.
Published: (2026)

ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding
by: Dai, Xinbang, et al.
Published: (2025)

Integrating ESG and AI: A Comprehensive Responsible AI Assessment Framework
by: Lee, Sung Une, et al.
Published: (2024)

Data and System Perspectives of Sustainable Artificial Intelligence
by: Xie, Tao, et al.
Published: (2025)

ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices
by: Kong, Dezhi, et al.
Published: (2026)

SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents
by: Hu, Wentao, et al.
Published: (2026)

TestAgent: An Adaptive and Intelligent Expert for Human Assessment
by: Yu, Junhao, et al.
Published: (2025)

ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
by: Yang, Xinwei, et al.
Published: (2025)

XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition
by: Zhiren, Gong, et al.
Published: (2026)

FORTIS: Benchmarking Over-Privilege in Agent Skills
by: Li, Shawn, et al.
Published: (2026)

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
by: Chen, Jingxuan, et al.
Published: (2024)

Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents
by: Li, Jiaxing, et al.
Published: (2026)

MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment
by: Wu, Qinzhuo, et al.
Published: (2026)

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing
by: Yang, Minglai, et al.
Published: (2026)

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
by: Dong, Haonan, et al.
Published: (2026)

General-Purpose Aerial Intelligent Agents Empowered by Large Language Models
by: Zhao, Ji, et al.
Published: (2025)

Bench-CoE: a Framework for Collaboration of Experts from Benchmark
by: Wang, Yuanshuai, et al.
Published: (2024)