Saved in:
| Main Authors: | Zhu, Youheng, Lu, Yiping |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01381 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Covering Framework for Offline POMDPs Learning using Belief Space Metric
by: Zhu, Youheng, et al.
Published: (2026)
by: Zhu, Youheng, et al.
Published: (2026)
Inference-Time Scaling for Generalist Reward Modeling
by: Liu, Zijun, et al.
Published: (2025)
by: Liu, Zijun, et al.
Published: (2025)
Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
by: Zhao, Wenshuo, et al.
Published: (2026)
by: Zhao, Wenshuo, et al.
Published: (2026)
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
by: Chen, Tianlang, et al.
Published: (2025)
by: Chen, Tianlang, et al.
Published: (2025)
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
by: Zhang, Dan, et al.
Published: (2025)
by: Zhang, Dan, et al.
Published: (2025)
Inference-Time Hyper-Scaling with KV Cache Compression
by: Łańcucki, Adrian, et al.
Published: (2025)
by: Łańcucki, Adrian, et al.
Published: (2025)
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverse Power Error or Inverse Power Estimation?
by: Xu, Ruihan, et al.
Published: (2025)
by: Xu, Ruihan, et al.
Published: (2025)
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures
by: Wang, Chenyang, et al.
Published: (2026)
by: Wang, Chenyang, et al.
Published: (2026)
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment
by: Wang, Ye, et al.
Published: (2026)
by: Wang, Ye, et al.
Published: (2026)
GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning
by: Wang, Chenglong, et al.
Published: (2025)
by: Wang, Chenglong, et al.
Published: (2025)
Bayesian Preference Learning for Test-Time Steerable Reward Models
by: Hong, Jiwoo, et al.
Published: (2026)
by: Hong, Jiwoo, et al.
Published: (2026)
MarkovScale: Towards Optimal Sequential Scaling at Inference Time
by: Wang, Youkang, et al.
Published: (2026)
by: Wang, Youkang, et al.
Published: (2026)
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
by: Hou, Zhenyu, et al.
Published: (2025)
by: Hou, Zhenyu, et al.
Published: (2025)
Noise Contrastive Alignment of Language Models with Explicit Rewards
by: Chen, Huayu, et al.
Published: (2024)
by: Chen, Huayu, et al.
Published: (2024)
GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler
by: Wang, Minghan, et al.
Published: (2026)
by: Wang, Minghan, et al.
Published: (2026)
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality
by: Hu, Zhimin, et al.
Published: (2026)
by: Hu, Zhimin, et al.
Published: (2026)
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
by: Setlur, Amrith, et al.
Published: (2024)
by: Setlur, Amrith, et al.
Published: (2024)
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
by: Karlekar, Sweta, et al.
Published: (2026)
by: Karlekar, Sweta, et al.
Published: (2026)
Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models
by: Wang, Jian, et al.
Published: (2025)
by: Wang, Jian, et al.
Published: (2025)
Scaling Inference-Efficient Language Models
by: Bian, Song, et al.
Published: (2025)
by: Bian, Song, et al.
Published: (2025)
Cascade Reward Sampling for Efficient Decoding-Time Alignment
by: Li, Bolian, et al.
Published: (2024)
by: Li, Bolian, et al.
Published: (2024)
Process Reward Models That Think
by: Khalifa, Muhammad, et al.
Published: (2025)
by: Khalifa, Muhammad, et al.
Published: (2025)
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
by: Ma, Qiyao, et al.
Published: (2026)
by: Ma, Qiyao, et al.
Published: (2026)
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
by: Sardana, Nikhil, et al.
Published: (2023)
by: Sardana, Nikhil, et al.
Published: (2023)
Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
by: Wu, Jiayun, et al.
Published: (2026)
by: Wu, Jiayun, et al.
Published: (2026)
Latent Thought Models with Variational Bayes Inference-Time Computation
by: Kong, Deqian, et al.
Published: (2025)
by: Kong, Deqian, et al.
Published: (2025)
Test-Time Scaling with Reflective Generative Model
by: Wang, Zixiao, et al.
Published: (2025)
by: Wang, Zixiao, et al.
Published: (2025)
Entropy-Regularized Process Reward Model
by: Zhang, Hanning, et al.
Published: (2024)
by: Zhang, Hanning, et al.
Published: (2024)
On Almost Surely Safe Alignment of Large Language Models at Inference-Time
by: Ji, Xiaotong, et al.
Published: (2025)
by: Ji, Xiaotong, et al.
Published: (2025)
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
by: Li, Hao, et al.
Published: (2023)
by: Li, Hao, et al.
Published: (2023)
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
by: Song, Yuxuan, et al.
Published: (2025)
by: Song, Yuxuan, et al.
Published: (2025)
Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling
by: Giannone, Giorgio, et al.
Published: (2025)
by: Giannone, Giorgio, et al.
Published: (2025)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness
by: Lu, Jingyu, et al.
Published: (2026)
by: Lu, Jingyu, et al.
Published: (2026)
Expected Reward Prediction, with Applications to Model Routing
by: Hasanaliyev, Kenan, et al.
Published: (2026)
by: Hasanaliyev, Kenan, et al.
Published: (2026)
Bootstrapping Language Models with DPO Implicit Rewards
by: Chen, Changyu, et al.
Published: (2024)
by: Chen, Changyu, et al.
Published: (2024)
RewardAnything: Generalizable Principle-Following Reward Models
by: Yu, Zhuohao, et al.
Published: (2025)
by: Yu, Zhuohao, et al.
Published: (2025)
Similar Items
-
A Covering Framework for Offline POMDPs Learning using Belief Space Metric
by: Zhu, Youheng, et al.
Published: (2026) -
Inference-Time Scaling for Generalist Reward Modeling
by: Liu, Zijun, et al.
Published: (2025) -
Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
by: Zhao, Wenshuo, et al.
Published: (2026) -
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
by: Chen, Tianlang, et al.
Published: (2025) -
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
by: Zhang, Dan, et al.
Published: (2025)