Salvato in:
Dettagli Bibliografici
Autori principali: Li, Siyi, Shi, Jiajun, Ni, Shiwen, Zhang, Ge, Li, Shuaimin, Wang, Shijian, Wen, Zhoufutu, Li, Yizhi, Alinejad-Rokny, Hamid, Liu, Jiaheng, Yang, Min, Huang, Wenhao
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2603.07078
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866914378063806464
author Li, Siyi
Shi, Jiajun
Ni, Shiwen
Zhang, Ge
Li, Shuaimin
Wang, Shijian
Wen, Zhoufutu
Li, Yizhi
Alinejad-Rokny, Hamid
Liu, Jiaheng
Yang, Min
Huang, Wenhao
author_facet Li, Siyi
Shi, Jiajun
Ni, Shiwen
Zhang, Ge
Li, Shuaimin
Wang, Shijian
Wen, Zhoufutu
Li, Yizhi
Alinejad-Rokny, Hamid
Liu, Jiaheng
Yang, Min
Huang, Wenhao
contents Large Reasoning Models (LRMs) have demonstrated strong performance by producing extended Chain-of-Thought (CoT) traces before answering. However, this paradigm often induces over-reasoning: redundant calculations and circular self-verification that increase computational cost without improving outcomes. Existing evaluations largely emphasize final accuracy or coarse token counts, and lack automated tools to separate essential logic from structural redundancy. We introduce CoTJudger, a graph-driven framework that quantifies reasoning efficiency by converting free-form CoTs into directed dependency graphs and extracting the Shortest Effective Path (SEP) needed to reach a correct solution. This yields an interpretable efficiency signal -- how much of a CoT is necessary versus structurally redundant -- that is comparable across models and tasks. Evaluating 21 LRMs, CoTJudger reveals pervasive redundancy and surfaces recurring failure modes, including verification obsession and compensatory redundancy. These results provide a practical metric for disentangling reasoning ability from computational waste, enabling more targeted evaluation and diagnosis of LRM efficiency.
format Preprint
id arxiv_https___arxiv_org_abs_2603_07078
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs
Li, Siyi
Shi, Jiajun
Ni, Shiwen
Zhang, Ge
Li, Shuaimin
Wang, Shijian
Wen, Zhoufutu
Li, Yizhi
Alinejad-Rokny, Hamid
Liu, Jiaheng
Yang, Min
Huang, Wenhao
Artificial Intelligence
Computation and Language
Large Reasoning Models (LRMs) have demonstrated strong performance by producing extended Chain-of-Thought (CoT) traces before answering. However, this paradigm often induces over-reasoning: redundant calculations and circular self-verification that increase computational cost without improving outcomes. Existing evaluations largely emphasize final accuracy or coarse token counts, and lack automated tools to separate essential logic from structural redundancy. We introduce CoTJudger, a graph-driven framework that quantifies reasoning efficiency by converting free-form CoTs into directed dependency graphs and extracting the Shortest Effective Path (SEP) needed to reach a correct solution. This yields an interpretable efficiency signal -- how much of a CoT is necessary versus structurally redundant -- that is comparable across models and tasks. Evaluating 21 LRMs, CoTJudger reveals pervasive redundancy and surfaces recurring failure modes, including verification obsession and compensatory redundancy. These results provide a practical metric for disentangling reasoning ability from computational waste, enabling more targeted evaluation and diagnosis of LRM efficiency.
title CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2603.07078