Enregistré dans:
Détails bibliographiques
Auteurs principaux: Kang, Zhaolu, Gong, Junhao, Hu, Wenqing, Yin, Shuo, Jiang, Kehan, Fang, Zhicheng, He, Yingjie, Meng, Chunlei, Fu, Rong, Chen, Dongyang, Zheng, Leqi, Jiang, Eric Hanchen, Feng, Yunfei, Leng, Yitong, Zhu, Junfan, Chen, Xiaoyou, Yang, Xi, Xuan, Richeng
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2601.08689
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866918291481559040
author Kang, Zhaolu
Gong, Junhao
Hu, Wenqing
Yin, Shuo
Jiang, Kehan
Fang, Zhicheng
He, Yingjie
Meng, Chunlei
Fu, Rong
Chen, Dongyang
Zheng, Leqi
Jiang, Eric Hanchen
Feng, Yunfei
Leng, Yitong
Zhu, Junfan
Chen, Xiaoyou
Yang, Xi
Xuan, Richeng
author_facet Kang, Zhaolu
Gong, Junhao
Hu, Wenqing
Yin, Shuo
Jiang, Kehan
Fang, Zhicheng
He, Yingjie
Meng, Chunlei
Fu, Rong
Chen, Dongyang
Zheng, Leqi
Jiang, Eric Hanchen
Feng, Yunfei
Leng, Yitong
Zhu, Junfan
Chen, Xiaoyou
Yang, Xi
Xuan, Richeng
contents Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
format Preprint
id arxiv_https___arxiv_org_abs_2601_08689
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models
Kang, Zhaolu
Gong, Junhao
Hu, Wenqing
Yin, Shuo
Jiang, Kehan
Fang, Zhicheng
He, Yingjie
Meng, Chunlei
Fu, Rong
Chen, Dongyang
Zheng, Leqi
Jiang, Eric Hanchen
Feng, Yunfei
Leng, Yitong
Zhu, Junfan
Chen, Xiaoyou
Yang, Xi
Xuan, Richeng
Computation and Language
Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
title QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models
topic Computation and Language
url https://arxiv.org/abs/2601.08689