Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qian, Lingfei, Zhou, Weipeng, Wang, Yan, Peng, Xueqing, Yi, Han, Zhao, Yilun, Huang, Jimin, Xie, Qianqian, Nie, Jian-yun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.08127
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911005927276544
author	Qian, Lingfei Zhou, Weipeng Wang, Yan Peng, Xueqing Yi, Han Zhao, Yilun Huang, Jimin Xie, Qianqian Nie, Jian-yun
author_facet	Qian, Lingfei Zhou, Weipeng Wang, Yan Peng, Xueqing Yi, Han Zhao, Yilun Huang, Jimin Xie, Qianqian Nie, Jian-yun
contents	As the fundamental capability behind decision-making in finance, financial reasoning poses distinct challenges for LLMs. Although reinforcement learning (RL) have boosted generic reasoning, the progress in finance is hindered by the absence of empirical study of building effective financial chain-of-thought (CoT) corpus, a systematic comparison of different RL methods, and comprehensive benchmarks. To address these gaps, we introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets by a novel three-stage pipeline that incorporates domain supervision, iterative LLM refinement, and difficulty-aware filtering. Based on FinCoT, we develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL. Our models outperform existing financial reasoning models and SOTA general models such as GPT-o1, DeepSeek-R1, and GPT-4.5. We also investigate the effectiveness of three different RL methods in improving domain-specific reasoning, offering the first such empirical study. We finally propose FinReason, the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks, and evaluate 29 LLMs. Our extensive experiments reveal general reasoning models excel on standard benchmarks yet exhibit obvious performance degradation in financial contexts; even finance-tuned models like Dianjin-R1 and FinR1 degrade on lengthy documents. In contrast, our Fin-o1 models consistently outperform their backbones and larger GPT-o1 and DeepSeek-R1, confirming the effectiveness of our data building and model training strategy. Our study further shows that GRPO yields reliable gains whereas PPO and DPO do not, highlighting the need for targeted data and optimisation rather than scale alone.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_08127
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance Qian, Lingfei Zhou, Weipeng Wang, Yan Peng, Xueqing Yi, Han Zhao, Yilun Huang, Jimin Xie, Qianqian Nie, Jian-yun Computation and Language As the fundamental capability behind decision-making in finance, financial reasoning poses distinct challenges for LLMs. Although reinforcement learning (RL) have boosted generic reasoning, the progress in finance is hindered by the absence of empirical study of building effective financial chain-of-thought (CoT) corpus, a systematic comparison of different RL methods, and comprehensive benchmarks. To address these gaps, we introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets by a novel three-stage pipeline that incorporates domain supervision, iterative LLM refinement, and difficulty-aware filtering. Based on FinCoT, we develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL. Our models outperform existing financial reasoning models and SOTA general models such as GPT-o1, DeepSeek-R1, and GPT-4.5. We also investigate the effectiveness of three different RL methods in improving domain-specific reasoning, offering the first such empirical study. We finally propose FinReason, the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks, and evaluate 29 LLMs. Our extensive experiments reveal general reasoning models excel on standard benchmarks yet exhibit obvious performance degradation in financial contexts; even finance-tuned models like Dianjin-R1 and FinR1 degrade on lengthy documents. In contrast, our Fin-o1 models consistently outperform their backbones and larger GPT-o1 and DeepSeek-R1, confirming the effectiveness of our data building and model training strategy. Our study further shows that GRPO yields reliable gains whereas PPO and DPO do not, highlighting the need for targeted data and optimisation rather than scale alone.
title	Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance
topic	Computation and Language
url	https://arxiv.org/abs/2502.08127

Similar Items