Saved in:
Bibliographic Details
Main Authors: Kang, Zhaolu, Gong, Junhao, Hu, Wenqing, Yin, Shuo, Jiang, Kehan, Fang, Zhicheng, He, Yingjie, Meng, Chunlei, Fu, Rong, Chen, Dongyang, Zheng, Leqi, Jiang, Eric Hanchen, Feng, Yunfei, Leng, Yitong, Zhu, Junfan, Chen, Xiaoyou, Yang, Xi, Xuan, Richeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.08689
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.