MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Li, Siyi, Shi, Jiajun, Ni, Shiwen, Zhang, Ge, Li, Shuaimin, Wang, Shijian, Wen, Zhoufutu, Li, Yizhi, Alinejad-Rokny, Hamid, Liu, Jiaheng, Yang, Min, Huang, Wenhao
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Artificial Intelligence Computation and Language
Accesso online:	https://arxiv.org/abs/2603.07078
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866914378063806464
author	Li, Siyi Shi, Jiajun Ni, Shiwen Zhang, Ge Li, Shuaimin Wang, Shijian Wen, Zhoufutu Li, Yizhi Alinejad-Rokny, Hamid Liu, Jiaheng Yang, Min Huang, Wenhao
author_facet	Li, Siyi Shi, Jiajun Ni, Shiwen Zhang, Ge Li, Shuaimin Wang, Shijian Wen, Zhoufutu Li, Yizhi Alinejad-Rokny, Hamid Liu, Jiaheng Yang, Min Huang, Wenhao
contents	Large Reasoning Models (LRMs) have demonstrated strong performance by producing extended Chain-of-Thought (CoT) traces before answering. However, this paradigm often induces over-reasoning: redundant calculations and circular self-verification that increase computational cost without improving outcomes. Existing evaluations largely emphasize final accuracy or coarse token counts, and lack automated tools to separate essential logic from structural redundancy. We introduce CoTJudger, a graph-driven framework that quantifies reasoning efficiency by converting free-form CoTs into directed dependency graphs and extracting the Shortest Effective Path (SEP) needed to reach a correct solution. This yields an interpretable efficiency signal -- how much of a CoT is necessary versus structurally redundant -- that is comparable across models and tasks. Evaluating 21 LRMs, CoTJudger reveals pervasive redundancy and surfaces recurring failure modes, including verification obsession and compensatory redundancy. These results provide a practical metric for disentangling reasoning ability from computational waste, enabling more targeted evaluation and diagnosis of LRM efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_07078
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs Li, Siyi Shi, Jiajun Ni, Shiwen Zhang, Ge Li, Shuaimin Wang, Shijian Wen, Zhoufutu Li, Yizhi Alinejad-Rokny, Hamid Liu, Jiaheng Yang, Min Huang, Wenhao Artificial Intelligence Computation and Language Large Reasoning Models (LRMs) have demonstrated strong performance by producing extended Chain-of-Thought (CoT) traces before answering. However, this paradigm often induces over-reasoning: redundant calculations and circular self-verification that increase computational cost without improving outcomes. Existing evaluations largely emphasize final accuracy or coarse token counts, and lack automated tools to separate essential logic from structural redundancy. We introduce CoTJudger, a graph-driven framework that quantifies reasoning efficiency by converting free-form CoTs into directed dependency graphs and extracting the Shortest Effective Path (SEP) needed to reach a correct solution. This yields an interpretable efficiency signal -- how much of a CoT is necessary versus structurally redundant -- that is comparable across models and tasks. Evaluating 21 LRMs, CoTJudger reveals pervasive redundancy and surfaces recurring failure modes, including verification obsession and compensatory redundancy. These results provide a practical metric for disentangling reasoning ability from computational waste, enabling more targeted evaluation and diagnosis of LRM efficiency.
title	CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2603.07078

Documenti analoghi