Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Gang, Chen, Yating, Ding, Haiyan, Gao, Wang, Huang, Jiajia, Peng, Min, Xie, Qianqian, Yue, Kun
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.08948
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913053148184576
author	Hu, Gang Chen, Yating Ding, Haiyan Gao, Wang Huang, Jiajia Peng, Min Xie, Qianqian Yue, Kun
author_facet	Hu, Gang Chen, Yating Ding, Haiyan Gao, Wang Huang, Jiajia Peng, Min Xie, Qianqian Yue, Kun
contents	While Large Language Models (LLMs) excel in various general domains, they exhibit notable gaps in the highly specialized, knowledge-intensive, and legally regulated Chinese tax domain. Consequently, while tax-related benchmarks are gaining attention, many focus on isolated NLP tasks, neglecting real-world practical capabilities. To address this issue, we introduce TaxPraBen, the first dedicated benchmark for Chinese taxation practice. It combines 10 traditional application tasks, along with 3 pioneering real-world scenarios: tax risk prevention, tax inspection analysis, and tax strategy planning, sourced from 14 datasets totaling 7.3K instances. TaxPraBen features a scalable structured evaluation paradigm designed through process of "structured parsing-field alignment extraction-numerical and textual matching", enabling end-to-end tax practice assessment while being extensible to other domains. We evaluate 19 LLMs based on Bloom's taxonomy. The results indicate significant performance disparities: all closed-source large-parameter LLMs excel, and Chinese LLMs like Qwen2.5 generally exceed multilingual LLMs, while the YaYi2 LLM, fine-tuned with some tax data, shows only limited improvement. TaxPraBen serves as a vital resource for advancing evaluations of LLMs in practical applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_08948
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice Hu, Gang Chen, Yating Ding, Haiyan Gao, Wang Huang, Jiajia Peng, Min Xie, Qianqian Yue, Kun Computation and Language While Large Language Models (LLMs) excel in various general domains, they exhibit notable gaps in the highly specialized, knowledge-intensive, and legally regulated Chinese tax domain. Consequently, while tax-related benchmarks are gaining attention, many focus on isolated NLP tasks, neglecting real-world practical capabilities. To address this issue, we introduce TaxPraBen, the first dedicated benchmark for Chinese taxation practice. It combines 10 traditional application tasks, along with 3 pioneering real-world scenarios: tax risk prevention, tax inspection analysis, and tax strategy planning, sourced from 14 datasets totaling 7.3K instances. TaxPraBen features a scalable structured evaluation paradigm designed through process of "structured parsing-field alignment extraction-numerical and textual matching", enabling end-to-end tax practice assessment while being extensible to other domains. We evaluate 19 LLMs based on Bloom's taxonomy. The results indicate significant performance disparities: all closed-source large-parameter LLMs excel, and Chinese LLMs like Qwen2.5 generally exceed multilingual LLMs, while the YaYi2 LLM, fine-tuned with some tax data, shows only limited improvement. TaxPraBen serves as a vital resource for advancing evaluations of LLMs in practical applications.
title	TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice
topic	Computation and Language
url	https://arxiv.org/abs/2604.08948

Similar Items