Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Shijian, Lin, Yan, Yang, Yi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2601.09142
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911421606920192
author	Ma, Shijian Lin, Yan Yang, Yi
author_facet	Ma, Shijian Lin, Yan Yang, Yi
contents	We present EvasionBench, a comprehensive benchmark for detecting evasive responses in corporate earnings call question-and-answer sessions. Drawing from 22.7 million Q&A pairs extracted from S&P Capital IQ transcripts, we construct a rigorously filtered dataset and introduce a three-level evasion taxonomy: direct, intermediate, and fully evasive. Our annotation pipeline employs a Multi-Model Consensus (MMC) framework, combining dual frontier LLM annotation with a three-judge majority voting mechanism for ambiguous cases, achieving a Cohen's Kappa of 0.835 on human inter-annotator agreement. We release: (1) a balanced 84K training set, (2) a 1K gold-standard evaluation set with expert human labels, and (3) [Eva-4B], a 4-billion parameter classifier fine-tuned from Qwen3-4B that achieves 84.9% Macro-F1, outperforming Claude 4.5, GPT-5.2, and Gemini 3 Flash. Our ablation studies demonstrate the effectiveness of multi-model consensus labeling over single-model annotation. EvasionBench fills a critical gap in financial NLP by providing the first large-scale benchmark specifically targeting managerial communication evasion.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_09142
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A Ma, Shijian Lin, Yan Yang, Yi Machine Learning Computation and Language We present EvasionBench, a comprehensive benchmark for detecting evasive responses in corporate earnings call question-and-answer sessions. Drawing from 22.7 million Q&A pairs extracted from S&P Capital IQ transcripts, we construct a rigorously filtered dataset and introduce a three-level evasion taxonomy: direct, intermediate, and fully evasive. Our annotation pipeline employs a Multi-Model Consensus (MMC) framework, combining dual frontier LLM annotation with a three-judge majority voting mechanism for ambiguous cases, achieving a Cohen's Kappa of 0.835 on human inter-annotator agreement. We release: (1) a balanced 84K training set, (2) a 1K gold-standard evaluation set with expert human labels, and (3) [Eva-4B], a 4-billion parameter classifier fine-tuned from Qwen3-4B that achieves 84.9% Macro-F1, outperforming Claude 4.5, GPT-5.2, and Gemini 3 Flash. Our ablation studies demonstrate the effectiveness of multi-model consensus labeling over single-model annotation. EvasionBench fills a critical gap in financial NLP by providing the first large-scale benchmark specifically targeting managerial communication evasion.
title	EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2601.09142

Similar Items