Saved in:
| Main Authors: | Williams, Jonathan, Tureci, Esin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10520 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reasoning with Latent Thoughts: On the Power of Looped Transformers
by: Saunshi, Nikunj, et al.
Published: (2025)
by: Saunshi, Nikunj, et al.
Published: (2025)
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
Personalized Generative Models for Contextual Debiasing
by: Liang, Xinran, et al.
Published: (2026)
by: Liang, Xinran, et al.
Published: (2026)
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
by: Zhou, Zhanke, et al.
Published: (2025)
by: Zhou, Zhanke, et al.
Published: (2025)
LLM Reasoning with Process Rewards for Outcome-Guided Steps
by: Rezaei, Mohammad, et al.
Published: (2026)
by: Rezaei, Mohammad, et al.
Published: (2026)
Reasoning to Learn from Latent Thoughts
by: Ruan, Yangjun, et al.
Published: (2025)
by: Ruan, Yangjun, et al.
Published: (2025)
Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models
by: Yang, Xiao-Wen, et al.
Published: (2026)
by: Yang, Xiao-Wen, et al.
Published: (2026)
Latent Chain-of-Thought Improves Structured-Data Transformers
by: Dudley, Carson, et al.
Published: (2026)
by: Dudley, Carson, et al.
Published: (2026)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
by: Chen, Haolin, et al.
Published: (2024)
by: Chen, Haolin, et al.
Published: (2024)
CTRLS: Chain-of-Thought Reasoning via Latent State-Transition
by: Wu, Junda, et al.
Published: (2025)
by: Wu, Junda, et al.
Published: (2025)
TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards
by: Xiong, Xiqiao, et al.
Published: (2025)
by: Xiong, Xiqiao, et al.
Published: (2025)
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
by: Butt, Natasha, et al.
Published: (2024)
by: Butt, Natasha, et al.
Published: (2024)
Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration
by: Zhong, Shuzhang, et al.
Published: (2026)
by: Zhong, Shuzhang, et al.
Published: (2026)
d2: Improving Reasoning in Diffusion Language Models via Trajectory Likelihood Estimation
by: Wang, Guanghan, et al.
Published: (2025)
by: Wang, Guanghan, et al.
Published: (2025)
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
by: Ye, Jiacheng, et al.
Published: (2024)
by: Ye, Jiacheng, et al.
Published: (2024)
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
by: Setlur, Amrith, et al.
Published: (2024)
by: Setlur, Amrith, et al.
Published: (2024)
Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning
by: Zabounidis, Renos, et al.
Published: (2025)
by: Zabounidis, Renos, et al.
Published: (2025)
Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning
by: Kong, Deqian, et al.
Published: (2026)
by: Kong, Deqian, et al.
Published: (2026)
Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing
by: Reichman, Benjamin, et al.
Published: (2026)
by: Reichman, Benjamin, et al.
Published: (2026)
Feedback Loops With Language Models Drive In-Context Reward Hacking
by: Pan, Alexander, et al.
Published: (2024)
by: Pan, Alexander, et al.
Published: (2024)
Reasoning with Latent Tokens in Diffusion Language Models
by: He, Andre, et al.
Published: (2026)
by: He, Andre, et al.
Published: (2026)
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
by: Luo, Liangchen, et al.
Published: (2024)
by: Luo, Liangchen, et al.
Published: (2024)
Modeling Complex Disease Trajectories using Deep Generative Models with Semi-Supervised Latent Processes
by: Trottet, Cécile, et al.
Published: (2023)
by: Trottet, Cécile, et al.
Published: (2023)
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
by: Lyu, Chengqi, et al.
Published: (2025)
by: Lyu, Chengqi, et al.
Published: (2025)
The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)
by: Zhang, Zhenru, et al.
Published: (2025)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
by: Chen, Changyu, et al.
Published: (2024)
by: Chen, Changyu, et al.
Published: (2024)
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
by: Yu, Qifan, et al.
Published: (2025)
by: Yu, Qifan, et al.
Published: (2025)
Intra-Trajectory Consistency for Reward Modeling
by: Zhou, Chaoyang, et al.
Published: (2025)
by: Zhou, Chaoyang, et al.
Published: (2025)
The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models
by: Welch, Robert, et al.
Published: (2026)
by: Welch, Robert, et al.
Published: (2026)
Trajectory Anomaly Detection with Language Models
by: Mbuya, Jonathan, et al.
Published: (2024)
by: Mbuya, Jonathan, et al.
Published: (2024)
A Mechanistic Analysis of Looped Reasoning Language Models
by: Blayney, Hugh, et al.
Published: (2026)
by: Blayney, Hugh, et al.
Published: (2026)
ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
by: Tanmay, Kumar, et al.
Published: (2025)
by: Tanmay, Kumar, et al.
Published: (2025)
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
by: Cao, Qi, et al.
Published: (2025)
by: Cao, Qi, et al.
Published: (2025)
Unsupervised Process Reward Models
by: Gadetsky, Artyom, et al.
Published: (2026)
by: Gadetsky, Artyom, et al.
Published: (2026)
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
by: Park, Taekhyun, et al.
Published: (2026)
by: Park, Taekhyun, et al.
Published: (2026)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards
by: Groeneveld, Jan Niklas, et al.
Published: (2025)
by: Groeneveld, Jan Niklas, et al.
Published: (2025)
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
by: Fein-Ashley, Jacob, et al.
Published: (2025)
by: Fein-Ashley, Jacob, et al.
Published: (2025)
Similar Items
-
Reasoning with Latent Thoughts: On the Power of Looped Transformers
by: Saunshi, Nikunj, et al.
Published: (2025) -
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
by: Zhang, Zheng, et al.
Published: (2025) -
Personalized Generative Models for Contextual Debiasing
by: Liang, Xinran, et al.
Published: (2026) -
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
by: Zhou, Zhanke, et al.
Published: (2025) -
LLM Reasoning with Process Rewards for Outcome-Guided Steps
by: Rezaei, Mohammad, et al.
Published: (2026)