Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xue, Siqiao, Li, Xiaojing, Zhou, Fan, Dai, Qingyang, Chu, Zhixuan, Mei, Hongyuan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.04526
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908363987615744
author	Xue, Siqiao Li, Xiaojing Zhou, Fan Dai, Qingyang Chu, Zhixuan Mei, Hongyuan
author_facet	Xue, Siqiao Li, Xiaojing Zhou, Fan Dai, Qingyang Chu, Zhixuan Mei, Hongyuan
contents	In this paper, we introduce FAMMA, an open-source benchmark for \underline{f}in\underline{a}ncial \underline{m}ultilingual \underline{m}ultimodal question \underline{a}nswering (QA). Our benchmark aims to evaluate the abilities of large language models (LLMs) in answering complex reasoning questions that require advanced financial knowledge. The benchmark has two versions: FAMMA-Basic consists of 1,945 questions extracted from university textbooks and exams, along with human-annotated answers and rationales; FAMMA-LivePro consists of 103 novel questions created by human domain experts, with answers and rationales held out from the public for a contamination-free evaluation. These questions cover advanced knowledge of 8 major subfields in finance (e.g., corporate finance, derivatives, and portfolio management). Some are in Chinese or French, while a majority of them are in English. Each question has some non-text data such as charts, diagrams, or tables. Our experiments reveal that FAMMA poses a significant challenge on LLMs, including reasoning models such as GPT-o1 and DeepSeek-R1. Additionally, we curated 1,270 reasoning trajectories of DeepSeek-R1 on the FAMMA-Basic data, and fine-tuned a series of open-source Qwen models using this reasoning data. We found that training a model on these reasoning trajectories can significantly improve its performance on FAMMA-LivePro. We released our leaderboard, data, code, and trained models at https://famma-bench.github.io/famma/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_04526
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering Xue, Siqiao Li, Xiaojing Zhou, Fan Dai, Qingyang Chu, Zhixuan Mei, Hongyuan Computation and Language Artificial Intelligence In this paper, we introduce FAMMA, an open-source benchmark for \underline{f}in\underline{a}ncial \underline{m}ultilingual \underline{m}ultimodal question \underline{a}nswering (QA). Our benchmark aims to evaluate the abilities of large language models (LLMs) in answering complex reasoning questions that require advanced financial knowledge. The benchmark has two versions: FAMMA-Basic consists of 1,945 questions extracted from university textbooks and exams, along with human-annotated answers and rationales; FAMMA-LivePro consists of 103 novel questions created by human domain experts, with answers and rationales held out from the public for a contamination-free evaluation. These questions cover advanced knowledge of 8 major subfields in finance (e.g., corporate finance, derivatives, and portfolio management). Some are in Chinese or French, while a majority of them are in English. Each question has some non-text data such as charts, diagrams, or tables. Our experiments reveal that FAMMA poses a significant challenge on LLMs, including reasoning models such as GPT-o1 and DeepSeek-R1. Additionally, we curated 1,270 reasoning trajectories of DeepSeek-R1 on the FAMMA-Basic data, and fine-tuned a series of open-source Qwen models using this reasoning data. We found that training a model on these reasoning trajectories can significantly improve its performance on FAMMA-LivePro. We released our leaderboard, data, code, and trained models at https://famma-bench.github.io/famma/.
title	FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2410.04526

Similar Items