Obsah: :: Library Catalog

Uloženo v:

Podrobná bibliografie
Hlavní autor:	Cantrell, Cole
Médium:	Recurso digital
Jazyk:
Vydáno:	Zenodo 2026
Témata:	Machine learning Artificial intelligence
On-line přístup:	https://doi.org/10.5281/zenodo.20124875
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Obsah:

Best-of-N reasoning over multiple chains-of-thought is a standard test-time compute strategy, but naive implementations run all candidate chains to completion before selecting a winner. This work introduces a self-calibrating divergence detector that identifies when chain trajectories have meaningfully separated, paired with a hybrid disposition policy that hard-kills clearly failed chains and scaffolds borderline ones at a verifier-cost discount. The detector uses a z-scored gap statistic against a per-problem null distribution established during the shared-prefix grace period, eliminating the need for dataset-specific detection thresholds. In static-label simulations on PRM800K, the mechanism reduces step-equivalent best-of-N compute by 22.8% at 99.6% winner accuracy, matching the candidate-pool oracle of 99.6% (100.0% of oracle). On Math-Shepherd, where auto-rollout labels produce a much lower candidate-pool ceiling, the same architecture with identical parameters reduces compute by 13.6% at 58.4% winner accuracy — 96.4% of the dataset’s 60.6% oracle. Across a 3×3 sensitivity sweep of the disposition parameters, winner accuracy is unchanged and compute saving varies by at most 3.3 percentage points, with zero killed-correct chains throughout. The mechanism is compute-cheap (negligible overhead relative to the inference it monitors), training-free, and reaches the candidate-pool oracle on PRM800K and 96.4% of the oracle on Math-Shepherd within sensitivity bounds. The contribution is the mechanism, not the saving number: three layers of structure beyond exponential smoothing (per-problem null-calibrated detection, hybrid level-based disposition, literature-grounded scaffold cost) that together produce a deployment-oriented adaptive branching scheme with no learned components.

Podobné jednotky