:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Williams, Jonathan, Tureci, Esin
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.10520
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reasoning with Latent Thoughts: On the Power of Looped Transformers
by: Saunshi, Nikunj, et al.
Published: (2025)

Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
by: Zhang, Zheng, et al.
Published: (2025)

Personalized Generative Models for Contextual Debiasing
by: Liang, Xinran, et al.
Published: (2026)

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
by: Zhou, Zhanke, et al.
Published: (2025)

LLM Reasoning with Process Rewards for Outcome-Guided Steps
by: Rezaei, Mohammad, et al.
Published: (2026)

Reasoning to Learn from Latent Thoughts
by: Ruan, Yangjun, et al.
Published: (2025)

Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models
by: Yang, Xiao-Wen, et al.
Published: (2026)

Latent Chain-of-Thought Improves Structured-Data Transformers
by: Dudley, Carson, et al.
Published: (2026)

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
by: Chen, Haolin, et al.
Published: (2024)

CTRLS: Chain-of-Thought Reasoning via Latent State-Transition
by: Wu, Junda, et al.
Published: (2025)

TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards
by: Xiong, Xiqiao, et al.
Published: (2025)

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
by: Butt, Natasha, et al.
Published: (2024)

Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration
by: Zhong, Shuzhang, et al.
Published: (2026)

d2: Improving Reasoning in Diffusion Language Models via Trajectory Likelihood Estimation
by: Wang, Guanghan, et al.
Published: (2025)

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
by: Yu, Zhuohao, et al.
Published: (2024)

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
by: Ye, Jiacheng, et al.
Published: (2024)

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
by: Setlur, Amrith, et al.
Published: (2024)

Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning
by: Zabounidis, Renos, et al.
Published: (2025)

Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning
by: Kong, Deqian, et al.
Published: (2026)

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing
by: Reichman, Benjamin, et al.
Published: (2026)

Feedback Loops With Language Models Drive In-Context Reward Hacking
by: Pan, Alexander, et al.
Published: (2024)

Reasoning with Latent Tokens in Diffusion Language Models
by: He, Andre, et al.
Published: (2026)

Improve Mathematical Reasoning in Language Models by Automated Process Supervision
by: Luo, Liangchen, et al.
Published: (2024)

Modeling Complex Disease Trajectories using Deep Generative Models with Semi-Supervised Latent Processes
by: Trottet, Cécile, et al.
Published: (2023)

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
by: Lyu, Chengqi, et al.
Published: (2025)

The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
by: Chen, Changyu, et al.
Published: (2024)

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
by: Yu, Qifan, et al.
Published: (2025)

Intra-Trajectory Consistency for Reward Modeling
by: Zhou, Chaoyang, et al.
Published: (2025)

The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models
by: Welch, Robert, et al.
Published: (2026)

Trajectory Anomaly Detection with Language Models
by: Mbuya, Jonathan, et al.
Published: (2024)

A Mechanistic Analysis of Looped Reasoning Language Models
by: Blayney, Hugh, et al.
Published: (2026)

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
by: Tanmay, Kumar, et al.
Published: (2025)

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
by: Cao, Qi, et al.
Published: (2025)

Unsupervised Process Reward Models
by: Gadetsky, Artyom, et al.
Published: (2026)

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
by: Park, Taekhyun, et al.
Published: (2026)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards
by: Groeneveld, Jan Niklas, et al.
Published: (2025)

Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
by: Fein-Ashley, Jacob, et al.
Published: (2025)