Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Kang, Zhaolu, Gong, Junhao, Hu, Wenqing, Yin, Shuo, Jiang, Kehan, Fang, Zhicheng, He, Yingjie, Meng, Chunlei, Fu, Rong, Chen, Dongyang, Zheng, Leqi, Jiang, Eric Hanchen, Feng, Yunfei, Leng, Yitong, Zhu, Junfan, Chen, Xiaoyou, Yang, Xi, Xuan, Richeng
Format:	Preprint
Publié:	2026
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2601.08689
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866918291481559040
author	Kang, Zhaolu Gong, Junhao Hu, Wenqing Yin, Shuo Jiang, Kehan Fang, Zhicheng He, Yingjie Meng, Chunlei Fu, Rong Chen, Dongyang Zheng, Leqi Jiang, Eric Hanchen Feng, Yunfei Leng, Yitong Zhu, Junfan Chen, Xiaoyou Yang, Xi Xuan, Richeng
author_facet	Kang, Zhaolu Gong, Junhao Hu, Wenqing Yin, Shuo Jiang, Kehan Fang, Zhicheng He, Yingjie Meng, Chunlei Fu, Rong Chen, Dongyang Zheng, Leqi Jiang, Eric Hanchen Feng, Yunfei Leng, Yitong Zhu, Junfan Chen, Xiaoyou Yang, Xi Xuan, Richeng
contents	Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_08689
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models Kang, Zhaolu Gong, Junhao Hu, Wenqing Yin, Shuo Jiang, Kehan Fang, Zhicheng He, Yingjie Meng, Chunlei Fu, Rong Chen, Dongyang Zheng, Leqi Jiang, Eric Hanchen Feng, Yunfei Leng, Yitong Zhu, Junfan Chen, Xiaoyou Yang, Xi Xuan, Richeng Computation and Language Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
title	QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2601.08689

Documents similaires