Saved in:
| Main Authors: | Liu, Yong, Liu, Ximan, Yang, Guoqing, Bai, Bing, Xu, Xiaoqiang, Chen, Zhen, Zhang, Ke, Li, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.13173 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
by: Min, Rui, et al.
Published: (2025)
by: Min, Rui, et al.
Published: (2025)
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
by: Ma, Kaijing, et al.
Published: (2024)
by: Ma, Kaijing, et al.
Published: (2024)
DriftBench: Defining and Generating Data and Query Workload Drift for Benchmarking
by: Liu, Guanli, et al.
Published: (2025)
by: Liu, Guanli, et al.
Published: (2025)
NeurBench: A Benchmark Suite for Learned Database Components with Drift Modeling
by: Zhao, Zhanhao, et al.
Published: (2025)
by: Zhao, Zhanhao, et al.
Published: (2025)
PGB: Benchmarking Differentially Private Synthetic Graph Generation Algorithms
by: Liu, Shang, et al.
Published: (2024)
by: Liu, Shang, et al.
Published: (2024)
LST-Bench: Benchmarking Log-Structured Tables in the Cloud
by: Camacho-Rodríguez, Jesús, et al.
Published: (2023)
by: Camacho-Rodríguez, Jesús, et al.
Published: (2023)
PandasBench: A Benchmark for the Pandas API
by: Broihier, Alex, et al.
Published: (2025)
by: Broihier, Alex, et al.
Published: (2025)
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
by: Zhou, Chenyu, et al.
Published: (2025)
by: Zhou, Chenyu, et al.
Published: (2025)
EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting
by: Panja, Madhurima, et al.
Published: (2026)
by: Panja, Madhurima, et al.
Published: (2026)
Verifiable, Efficient and Confidentiality-Preserving Graph Search with Transparency
by: Wang, Qiuhao, et al.
Published: (2025)
by: Wang, Qiuhao, et al.
Published: (2025)
SemBench: A Benchmark for Semantic Query Processing Engines
by: Lao, Jiale, et al.
Published: (2025)
by: Lao, Jiale, et al.
Published: (2025)
DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning
by: Ahmed, Ahmed G. A. H, et al.
Published: (2026)
by: Ahmed, Ahmed G. A. H, et al.
Published: (2026)
DP-Bench: A Benchmark for Evaluating Data Product Creation Systems
by: Chowdhury, Faisal, et al.
Published: (2025)
by: Chowdhury, Faisal, et al.
Published: (2025)
CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases
by: Chronis, Yannis, et al.
Published: (2024)
by: Chronis, Yannis, et al.
Published: (2024)
FOCUS: Boosting Schema-aware Access for KV Stores via Hierarchical Data Management
by: Liu, Zhen, et al.
Published: (2025)
by: Liu, Zhen, et al.
Published: (2025)
Revisiting Data Analysis with Pre-trained Foundation Models
by: Liang, Chen, et al.
Published: (2025)
by: Liang, Chen, et al.
Published: (2025)
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine
by: Xie, Jiacheng, et al.
Published: (2025)
by: Xie, Jiacheng, et al.
Published: (2025)
ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities
by: Zanoli, Christopher, et al.
Published: (2026)
by: Zanoli, Christopher, et al.
Published: (2026)
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
by: Jin, Tengjun, et al.
Published: (2025)
by: Jin, Tengjun, et al.
Published: (2025)
Benchmarking Time Series Databases with IoTDB-Benchmark for IoT Scenarios
by: Liu, Rui, et al.
Published: (2019)
by: Liu, Rui, et al.
Published: (2019)
TokBench: Evaluating Your Visual Tokenizer before Visual Generation
by: Wu, Junfeng, et al.
Published: (2025)
by: Wu, Junfeng, et al.
Published: (2025)
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association
by: Wang, Weiqi, et al.
Published: (2025)
by: Wang, Weiqi, et al.
Published: (2025)
RADAR: Benchmarking Language Models on Imperfect Tabular Data
by: Gu, Ken, et al.
Published: (2025)
by: Gu, Ken, et al.
Published: (2025)
RelBench: A Benchmark for Deep Learning on Relational Databases
by: Robinson, Joshua, et al.
Published: (2024)
by: Robinson, Joshua, et al.
Published: (2024)
TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval
by: Xu, Wenbo, et al.
Published: (2024)
by: Xu, Wenbo, et al.
Published: (2024)
CrypQ: A Database Benchmark Based on Dynamic, Ever-Evolving Ethereum Data
by: Capol, Vincent, et al.
Published: (2024)
by: Capol, Vincent, et al.
Published: (2024)
FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting
by: Ji, Fengxian, et al.
Published: (2026)
by: Ji, Fengxian, et al.
Published: (2026)
UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data
by: Weng, Han, et al.
Published: (2025)
by: Weng, Han, et al.
Published: (2025)
PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing
by: Agnihotri, Pratyush, et al.
Published: (2025)
by: Agnihotri, Pratyush, et al.
Published: (2025)
EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce
by: Yu, Minhyeong, et al.
Published: (2026)
by: Yu, Minhyeong, et al.
Published: (2026)
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
by: Tang, Zirui, et al.
Published: (2026)
by: Tang, Zirui, et al.
Published: (2026)
PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking
by: Zhou, Yan, et al.
Published: (2025)
by: Zhou, Yan, et al.
Published: (2025)
ResBench: A Comprehensive Framework for Evaluating Database Resilience
by: Hu, Puyun, et al.
Published: (2025)
by: Hu, Puyun, et al.
Published: (2025)
OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models
by: Lin, Ling, et al.
Published: (2026)
by: Lin, Ling, et al.
Published: (2026)
MatTools: Benchmarking Large Language Models for Materials Science Tools
by: Liu, Siyu, et al.
Published: (2025)
by: Liu, Siyu, et al.
Published: (2025)
Towards FAIR and federated Data Ecosystems for interdisciplinary Research
by: Beyvers, Sebastian, et al.
Published: (2025)
by: Beyvers, Sebastian, et al.
Published: (2025)
Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits
by: Zheng, Zhuanglin, et al.
Published: (2026)
by: Zheng, Zhuanglin, et al.
Published: (2026)
Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture
by: Xu, Jingyu, et al.
Published: (2024)
by: Xu, Jingyu, et al.
Published: (2024)
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes
by: Bharadwaj, Manasa, et al.
Published: (2026)
by: Bharadwaj, Manasa, et al.
Published: (2026)
Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science
by: Vargas-Solar, Genoveva, et al.
Published: (2025)
by: Vargas-Solar, Genoveva, et al.
Published: (2025)
Similar Items
-
EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
by: Min, Rui, et al.
Published: (2025) -
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
by: Ma, Kaijing, et al.
Published: (2024) -
DriftBench: Defining and Generating Data and Query Workload Drift for Benchmarking
by: Liu, Guanli, et al.
Published: (2025) -
NeurBench: A Benchmark Suite for Learned Database Components with Drift Modeling
by: Zhao, Zhanhao, et al.
Published: (2025) -
PGB: Benchmarking Differentially Private Synthetic Graph Generation Algorithms
by: Liu, Shang, et al.
Published: (2024)