Saved in:
Bibliographic Details
Main Author: Materzok, Tobias
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21169
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We introduce Output-Space Search (OS-Search), which turns LLM generation into endpoint search. An outer loop selects a target z* in a frozen encoder-defined 3D output space Z, and a retrieval-grounded policy trained with sequence-level RL generates outputs whose coordinates land near z* under standard autoregressive decoding. This enables parallel sweeps and black-box optimization in Z without path-dependent token/program search. On stories, sweeping Z (text) yields 3.1x higher LLM-scored diversity than prompt-chaining. On code, Bayesian optimization over Z (code) improves an objective withheld from the controller under matched inference budgets while preserving validity.