সংরক্ষণ করুন:
গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক: Vachharajani, Pranav
বিন্যাস: Recurso digital
ভাষা:ইংরেজি
প্রকাশিত: Zenodo 2026
বিষয়গুলি:
অনলাইন ব্যবহার করুন:https://doi.org/10.5281/zenodo.19553597
ট্যাগগুলো: ট্যাগ যুক্ত করুন
কোনো ট্যাগ নেই, প্রথমজন হিসাবে ট্যাগ করুন!
সূচিপত্রের সারণি:
  • <p>Large Language Models (LLMs) increasingly rely on extended inference-time computation<br>to solve complex tasks. However, longer reasoning does not guarantee correctness: models often<br>follow flawed premises, ethical oversimplifications, or self-referential loops, producing confident<br>but incorrect outputs after substantial compute expenditure. Existing supervision approaches<br>predominantly evaluate final answers, providing no mechanism to intervene once a reasoning<br>trajectory has already diverged.<br>We propose a Dual-Model Process Supervision Framework that introduces a<br>lightweight Observer model to monitor and evaluate intermediate reasoning segments produced<br>by a high-capability Student model during inference. Rather than supervising outcomes, the<br>Observer performs process-level auditing, selectively intervening when semantic failure<br>modes—such as invalid premises, circular reasoning, or ethical oversimplification—are detected.<br>We formalize an Optimal Intervention Point (OIP) as a fixed semantic checkpoint that<br>enables early termination of flawed reasoning trajectories while preserving benign exploratory<br>reasoning.<br>Through controlled ablation experiments across business strategy, logical reasoning, ethical<br>dilemmas, and paradoxical tasks, we demonstrate that process supervision (i) achieves 84%<br>precision in detecting flawed reasoning with 100% recall on logic traps, (ii) reduces inference-time<br>token consumption by 44%, and (iii) maintains 60% pass-through rate for valid exploratory<br>reasoning. Our results suggest that reliable reasoning requires not merely thinking longer, but<br>thinking under supervision.</p>