Saved in:
| Main Author: | Panda, Silu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29586 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models
by: Liu, Shu, et al.
Published: (2024)
by: Liu, Shu, et al.
Published: (2024)
FinBen: A Holistic Financial Benchmark for Large Language Models
by: Xie, Qianqian, et al.
Published: (2024)
by: Xie, Qianqian, et al.
Published: (2024)
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications
by: Cao, Yupeng, et al.
Published: (2025)
by: Cao, Yupeng, et al.
Published: (2025)
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text
by: Pall, Rajveer Singh
Published: (2026)
by: Pall, Rajveer Singh
Published: (2026)
Financial Statement Analysis with Large Language Models
by: Kim, Alex, et al.
Published: (2024)
by: Kim, Alex, et al.
Published: (2024)
FinReflectKG -- EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation
by: Dimino, Fabrizio, et al.
Published: (2025)
by: Dimino, Fabrizio, et al.
Published: (2025)
FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis
by: Nguyen, Quang Hung, et al.
Published: (2025)
by: Nguyen, Quang Hung, et al.
Published: (2025)
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
by: Choi, Chanyeol, et al.
Published: (2025)
by: Choi, Chanyeol, et al.
Published: (2025)
FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol
by: Zhu, Jie, et al.
Published: (2026)
by: Zhu, Jie, et al.
Published: (2026)
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
by: Malarkkan, Arun Vignesh, et al.
Published: (2026)
by: Malarkkan, Arun Vignesh, et al.
Published: (2026)
FinForge: Semi-Synthetic Financial Benchmark Generation
by: Matlin, Glenn, et al.
Published: (2026)
by: Matlin, Glenn, et al.
Published: (2026)
FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models
by: Yuan, Ziqiang, et al.
Published: (2024)
by: Yuan, Ziqiang, et al.
Published: (2024)
FinTradeBench: A Financial Reasoning Benchmark for LLMs
by: Agrawal, Yogesh, et al.
Published: (2026)
by: Agrawal, Yogesh, et al.
Published: (2026)
BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
by: Lu, Guilong, et al.
Published: (2025)
by: Lu, Guilong, et al.
Published: (2025)
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
by: Lu, Jiaxuan, et al.
Published: (2026)
by: Lu, Jiaxuan, et al.
Published: (2026)
FinLLM-B: When Large Language Models Meet Financial Breakout Trading
by: Zhang, Kang, et al.
Published: (2024)
by: Zhang, Kang, et al.
Published: (2024)
FinS-Pilot: A Benchmark for Online Financial RAG System
by: Wang, Feng, et al.
Published: (2025)
by: Wang, Feng, et al.
Published: (2025)
BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment
by: Guo, Xin, et al.
Published: (2026)
by: Guo, Xin, et al.
Published: (2026)
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
by: Bhatia, Gagan, et al.
Published: (2024)
by: Bhatia, Gagan, et al.
Published: (2024)
FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning
by: Deng, Shuangyan, et al.
Published: (2025)
by: Deng, Shuangyan, et al.
Published: (2025)
FinSheet-Bench: From Simple Lookups to Complex Reasoning, Where LLMs Break on Financial Spreadsheets
by: Ravnik, Jan, et al.
Published: (2026)
by: Ravnik, Jan, et al.
Published: (2026)
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
by: Magomere, Jabez, et al.
Published: (2025)
by: Magomere, Jabez, et al.
Published: (2025)
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
by: Yang, Zhi, et al.
Published: (2026)
by: Yang, Zhi, et al.
Published: (2026)
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems
by: Taheri, Ali, et al.
Published: (2025)
by: Taheri, Ali, et al.
Published: (2025)
Deep FinResearch Bench: Evaluating AI's Ability to Conduct Professional Financial Investment Research
by: Haque, Mirazul, et al.
Published: (2026)
by: Haque, Mirazul, et al.
Published: (2026)
AccessEval: Benchmarking Disability Bias in Large Language Models
by: Panda, Srikant, et al.
Published: (2025)
by: Panda, Srikant, et al.
Published: (2025)
ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
by: Zheng, Yushuo, et al.
Published: (2026)
by: Zheng, Yushuo, et al.
Published: (2026)
PetroBench: A Benchmark for Large Language Models in Petroleum Engineering
by: Wang, Xiang, et al.
Published: (2026)
by: Wang, Xiang, et al.
Published: (2026)
SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy
by: Xiao, Peiyao, et al.
Published: (2026)
by: Xiao, Peiyao, et al.
Published: (2026)
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
by: Xu, Peiran, et al.
Published: (2025)
by: Xu, Peiran, et al.
Published: (2025)
TaskBench: Benchmarking Large Language Models for Task Automation
by: Shen, Yongliang, et al.
Published: (2023)
by: Shen, Yongliang, et al.
Published: (2023)
FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents
by: Kim, Eric Y., et al.
Published: (2026)
by: Kim, Eric Y., et al.
Published: (2026)
FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information
by: Wang, Yan, et al.
Published: (2025)
by: Wang, Yan, et al.
Published: (2025)
FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data
by: Sinha, Ankur, et al.
Published: (2025)
by: Sinha, Ankur, et al.
Published: (2025)
Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models
by: Berger, Armin, et al.
Published: (2025)
by: Berger, Armin, et al.
Published: (2025)
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models
by: Zhou, Xiyuan, et al.
Published: (2024)
by: Zhou, Xiyuan, et al.
Published: (2024)
TurkBench: A Benchmark for Evaluating Turkish Large Language Models
by: Toraman, Çağrı, et al.
Published: (2026)
by: Toraman, Çağrı, et al.
Published: (2026)
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
by: Xie, Zhuohan, et al.
Published: (2025)
by: Xie, Zhuohan, et al.
Published: (2025)
Similar Items
-
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models
by: Liu, Shu, et al.
Published: (2024) -
FinBen: A Holistic Financial Benchmark for Large Language Models
by: Xie, Qianqian, et al.
Published: (2024) -
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications
by: Cao, Yupeng, et al.
Published: (2025) -
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text
by: Pall, Rajveer Singh
Published: (2026) -
Financial Statement Analysis with Large Language Models
by: Kim, Alex, et al.
Published: (2024)