Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lv, Bo, Sun, Yasheng, Wang, Junjie, Shi, Haoxiang
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.13738
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912904438087680
author	Lv, Bo Sun, Yasheng Wang, Junjie Shi, Haoxiang
author_facet	Lv, Bo Sun, Yasheng Wang, Junjie Shi, Haoxiang
contents	Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11\times$ with only a $2.21\%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8\times$. On long-chain logical reasoning, OneLatent reaches $99.80\%$ on ProntoQA and $97.80\%$ on ProsQA with one latent token, with compression up to $87.4\times$, supporting compression-constrained generalization.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13738
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	OneLatent: Single-Token Compression for Visual Latent Reasoning Lv, Bo Sun, Yasheng Wang, Junjie Shi, Haoxiang Artificial Intelligence Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11\times$ with only a $2.21\%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8\times$. On long-chain logical reasoning, OneLatent reaches $99.80\%$ on ProntoQA and $97.80\%$ on ProsQA with one latent token, with compression up to $87.4\times$, supporting compression-constrained generalization.
title	OneLatent: Single-Token Compression for Visual Latent Reasoning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.13738

Similar Items