Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.18940 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910077472997376 |
|---|---|
| author | Zhao, Xinghao |
| author_facet | Zhao, Xinghao |
| contents | Understanding uncertainty in chain-of-thought reasoning is critical for reliable deployment of large language models. In this work, we propose a simple yet effective diagnostic approach based on trajectory shape rather than scalar magnitude. We show that this signal is practical, interpretable, and inexpensive to obtain in black-box settings, while remaining robust across models and datasets. Through extensive ablations and cross-domain replications, we demonstrate its utility for selective prediction and triage. Our findings offer a generalizable insight into uncertainty dynamics in reasoning tasks, with particular focus on numeric and discrete-answer settings. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_18940 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought Zhao, Xinghao Computation and Language Machine Learning Understanding uncertainty in chain-of-thought reasoning is critical for reliable deployment of large language models. In this work, we propose a simple yet effective diagnostic approach based on trajectory shape rather than scalar magnitude. We show that this signal is practical, interpretable, and inexpensive to obtain in black-box settings, while remaining robust across models and datasets. Through extensive ablations and cross-domain replications, we demonstrate its utility for selective prediction and triage. Our findings offer a generalizable insight into uncertainty dynamics in reasoning tasks, with particular focus on numeric and discrete-answer settings. |
| title | Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought |
| topic | Computation and Language Machine Learning |
| url | https://arxiv.org/abs/2603.18940 |