সংরক্ষণ করুন:
| প্রধান লেখক: | |
|---|---|
| বিন্যাস: | Recurso digital |
| ভাষা: | ইংরেজি |
| প্রকাশিত: |
Zenodo
2026
|
| বিষয়গুলি: | |
| অনলাইন ব্যবহার করুন: | https://doi.org/10.5281/zenodo.19553597 |
| ট্যাগগুলো: |
ট্যাগ যুক্ত করুন
কোনো ট্যাগ নেই, প্রথমজন হিসাবে ট্যাগ করুন!
|
সূচিপত্রের সারণি:
- <p>Large Language Models (LLMs) increasingly rely on extended inference-time computation<br>to solve complex tasks. However, longer reasoning does not guarantee correctness: models often<br>follow flawed premises, ethical oversimplifications, or self-referential loops, producing confident<br>but incorrect outputs after substantial compute expenditure. Existing supervision approaches<br>predominantly evaluate final answers, providing no mechanism to intervene once a reasoning<br>trajectory has already diverged.<br>We propose a Dual-Model Process Supervision Framework that introduces a<br>lightweight Observer model to monitor and evaluate intermediate reasoning segments produced<br>by a high-capability Student model during inference. Rather than supervising outcomes, the<br>Observer performs process-level auditing, selectively intervening when semantic failure<br>modes—such as invalid premises, circular reasoning, or ethical oversimplification—are detected.<br>We formalize an Optimal Intervention Point (OIP) as a fixed semantic checkpoint that<br>enables early termination of flawed reasoning trajectories while preserving benign exploratory<br>reasoning.<br>Through controlled ablation experiments across business strategy, logical reasoning, ethical<br>dilemmas, and paradoxical tasks, we demonstrate that process supervision (i) achieves 84%<br>precision in detecting flawed reasoning with 100% recall on logic traps, (ii) reduces inference-time<br>token consumption by 44%, and (iii) maintains 60% pass-through rate for valid exploratory<br>reasoning. Our results suggest that reliable reasoning requires not merely thinking longer, but<br>thinking under supervision.</p>