Saved in:
Bibliographic Details
Main Authors: Qian, Lingfei, Zhou, Weipeng, Wang, Yan, Peng, Xueqing, Yi, Han, Zhao, Yilun, Huang, Jimin, Xie, Qianqian, Nie, Jian-yun
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.08127
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911005927276544
author Qian, Lingfei
Zhou, Weipeng
Wang, Yan
Peng, Xueqing
Yi, Han
Zhao, Yilun
Huang, Jimin
Xie, Qianqian
Nie, Jian-yun
author_facet Qian, Lingfei
Zhou, Weipeng
Wang, Yan
Peng, Xueqing
Yi, Han
Zhao, Yilun
Huang, Jimin
Xie, Qianqian
Nie, Jian-yun
contents As the fundamental capability behind decision-making in finance, financial reasoning poses distinct challenges for LLMs. Although reinforcement learning (RL) have boosted generic reasoning, the progress in finance is hindered by the absence of empirical study of building effective financial chain-of-thought (CoT) corpus, a systematic comparison of different RL methods, and comprehensive benchmarks. To address these gaps, we introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets by a novel three-stage pipeline that incorporates domain supervision, iterative LLM refinement, and difficulty-aware filtering. Based on FinCoT, we develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL. Our models outperform existing financial reasoning models and SOTA general models such as GPT-o1, DeepSeek-R1, and GPT-4.5. We also investigate the effectiveness of three different RL methods in improving domain-specific reasoning, offering the first such empirical study. We finally propose FinReason, the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks, and evaluate 29 LLMs. Our extensive experiments reveal general reasoning models excel on standard benchmarks yet exhibit obvious performance degradation in financial contexts; even finance-tuned models like Dianjin-R1 and FinR1 degrade on lengthy documents. In contrast, our Fin-o1 models consistently outperform their backbones and larger GPT-o1 and DeepSeek-R1, confirming the effectiveness of our data building and model training strategy. Our study further shows that GRPO yields reliable gains whereas PPO and DPO do not, highlighting the need for targeted data and optimisation rather than scale alone.
format Preprint
id arxiv_https___arxiv_org_abs_2502_08127
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance
Qian, Lingfei
Zhou, Weipeng
Wang, Yan
Peng, Xueqing
Yi, Han
Zhao, Yilun
Huang, Jimin
Xie, Qianqian
Nie, Jian-yun
Computation and Language
As the fundamental capability behind decision-making in finance, financial reasoning poses distinct challenges for LLMs. Although reinforcement learning (RL) have boosted generic reasoning, the progress in finance is hindered by the absence of empirical study of building effective financial chain-of-thought (CoT) corpus, a systematic comparison of different RL methods, and comprehensive benchmarks. To address these gaps, we introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets by a novel three-stage pipeline that incorporates domain supervision, iterative LLM refinement, and difficulty-aware filtering. Based on FinCoT, we develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL. Our models outperform existing financial reasoning models and SOTA general models such as GPT-o1, DeepSeek-R1, and GPT-4.5. We also investigate the effectiveness of three different RL methods in improving domain-specific reasoning, offering the first such empirical study. We finally propose FinReason, the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks, and evaluate 29 LLMs. Our extensive experiments reveal general reasoning models excel on standard benchmarks yet exhibit obvious performance degradation in financial contexts; even finance-tuned models like Dianjin-R1 and FinR1 degrade on lengthy documents. In contrast, our Fin-o1 models consistently outperform their backbones and larger GPT-o1 and DeepSeek-R1, confirming the effectiveness of our data building and model training strategy. Our study further shows that GRPO yields reliable gains whereas PPO and DPO do not, highlighting the need for targeted data and optimisation rather than scale alone.
title Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance
topic Computation and Language
url https://arxiv.org/abs/2502.08127