সূচিপত্রের সারণি:

সংরক্ষণ করুন:

গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক:	Vachharajani, Pranav
বিন্যাস:	Recurso digital
ভাষা:	ইংরেজি
প্রকাশিত:	Zenodo 2026
বিষয়গুলি:	Artificial intelligence
অনলাইন ব্যবহার করুন:	https://doi.org/10.5281/zenodo.19553597
ট্যাগগুলো:	ট্যাগ যুক্ত করুন কোনো ট্যাগ নেই, প্রথমজন হিসাবে ট্যাগ করুন!

সূচিপত্রের সারণি:

Large Language Models (LLMs) increasingly rely on extended inference-time computation to solve complex tasks. However, longer reasoning does not guarantee correctness: models often follow flawed premises, ethical oversimplifications, or self-referential loops, producing confident but incorrect outputs after substantial compute expenditure. Existing supervision approaches predominantly evaluate final answers, providing no mechanism to intervene once a reasoning trajectory has already diverged. We propose a Dual-Model Process Supervision Framework that introduces a lightweight Observer model to monitor and evaluate intermediate reasoning segments produced by a high-capability Student model during inference. Rather than supervising outcomes, the Observer performs process-level auditing, selectively intervening when semantic failure modes—such as invalid premises, circular reasoning, or ethical oversimplification—are detected. We formalize an Optimal Intervention Point (OIP) as a fixed semantic checkpoint that enables early termination of flawed reasoning trajectories while preserving benign exploratory reasoning. Through controlled ablation experiments across business strategy, logical reasoning, ethical dilemmas, and paradoxical tasks, we demonstrate that process supervision (i) achieves 84% precision in detecting flawed reasoning with 100% recall on logic traps, (ii) reduces inference-time token consumption by 44%, and (iii) maintains 60% pass-through rate for valid exploratory reasoning. Our results suggest that reliable reasoning requires not merely thinking longer, but thinking under supervision.

অনুরূপ উপাদানগুলি