Saved in:
| Main Authors: | Wang, Xiaoxuan, Liu, Bo, Jiang, Song, Liu, Jingzhou, Qi, Jingyuan, Chen, Xia, He, Baosheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.15137 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
by: Liu, Yixin, et al.
Published: (2026)
by: Liu, Yixin, et al.
Published: (2026)
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
by: Huang, Yuzhen, et al.
Published: (2025)
by: Huang, Yuzhen, et al.
Published: (2025)
Incentivizing LLMs to Self-Verify Their Answers
by: Zhang, Fuxiang, et al.
Published: (2025)
by: Zhang, Fuxiang, et al.
Published: (2025)
Divide-Verify-Refine: Can LLMs Self-Align with Complex Instructions?
by: Zhang, Xianren, et al.
Published: (2024)
by: Zhang, Xianren, et al.
Published: (2024)
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs
by: Wang, Ruoyu, et al.
Published: (2024)
by: Wang, Ruoyu, et al.
Published: (2024)
Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs
by: Ren, Yanwei, et al.
Published: (2025)
by: Ren, Yanwei, et al.
Published: (2025)
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
Can LLMs Learn to Reason Robustly under Noisy Supervision?
by: Yang, Shenzhi, et al.
Published: (2026)
by: Yang, Shenzhi, et al.
Published: (2026)
Regularized Multi-LLMs Collaboration for Enhanced Score-based Causal Discovery
by: Li, Xiaoxuan, et al.
Published: (2024)
by: Li, Xiaoxuan, et al.
Published: (2024)
Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems
by: Xia, Yifan, et al.
Published: (2024)
by: Xia, Yifan, et al.
Published: (2024)
SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
by: Ren, Yanwei, et al.
Published: (2025)
by: Ren, Yanwei, et al.
Published: (2025)
From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
by: Chen, Jiaxiang, et al.
Published: (2025)
by: Chen, Jiaxiang, et al.
Published: (2025)
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
by: Fan, Jiajun, et al.
Published: (2025)
by: Fan, Jiajun, et al.
Published: (2025)
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
by: Singhi, Nishad, et al.
Published: (2025)
by: Singhi, Nishad, et al.
Published: (2025)
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
by: Zhang, Zijing, et al.
Published: (2025)
by: Zhang, Zijing, et al.
Published: (2025)
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning
by: Jiang, Xitai, et al.
Published: (2026)
by: Jiang, Xitai, et al.
Published: (2026)
Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting
by: Cheng, Mingyue, et al.
Published: (2025)
by: Cheng, Mingyue, et al.
Published: (2025)
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
by: Zhou, Yitong, et al.
Published: (2025)
by: Zhou, Yitong, et al.
Published: (2025)
Full Bayesian Significance Testing for Neural Networks
by: Liu, Zehua, et al.
Published: (2024)
by: Liu, Zehua, et al.
Published: (2024)
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
EVGeoQA: Benchmarking LLMs on Dynamic, Multi-Objective Geo-Spatial Exploration
by: Wu, Jianfei, et al.
Published: (2026)
by: Wu, Jianfei, et al.
Published: (2026)
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
by: Zhao, Yanxiao, et al.
Published: (2025)
by: Zhao, Yanxiao, et al.
Published: (2025)
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
by: Sun, Yiyou, et al.
Published: (2025)
by: Sun, Yiyou, et al.
Published: (2025)
On the Equivalence of Graph Convolution and Mixup
by: Han, Xiaotian, et al.
Published: (2023)
by: Han, Xiaotian, et al.
Published: (2023)
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
by: Sareen, Kusha, et al.
Published: (2025)
by: Sareen, Kusha, et al.
Published: (2025)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
by: Li, Bo, et al.
Published: (2025)
by: Li, Bo, et al.
Published: (2025)
A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs
by: Hettige, Kethmi Hirushini, et al.
Published: (2025)
by: Hettige, Kethmi Hirushini, et al.
Published: (2025)
Unified Parameter-Efficient Unlearning for LLMs
by: Ding, Chenlu, et al.
Published: (2024)
by: Ding, Chenlu, et al.
Published: (2024)
Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation
by: Zhong, Yisheng, et al.
Published: (2026)
by: Zhong, Yisheng, et al.
Published: (2026)
GAPO: Robust Advantage Estimation for Real-World Code LLMs
by: Zhang, Jianqing, et al.
Published: (2025)
by: Zhang, Jianqing, et al.
Published: (2025)
KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)
by: Xu, Hongling, et al.
Published: (2025)
From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code
by: Mahesh, Anirudh Jaidev, et al.
Published: (2026)
by: Mahesh, Anirudh Jaidev, et al.
Published: (2026)
Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives
by: Wang, Zecheng, et al.
Published: (2026)
by: Wang, Zecheng, et al.
Published: (2026)
Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
by: Xu, Wenzhe, et al.
Published: (2026)
by: Xu, Wenzhe, et al.
Published: (2026)
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
by: Qu, Yuxiao, et al.
Published: (2025)
by: Qu, Yuxiao, et al.
Published: (2025)
GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning
by: Wang, Jingyi, et al.
Published: (2026)
by: Wang, Jingyi, et al.
Published: (2026)
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs
by: Salimi, Moein, et al.
Published: (2026)
by: Salimi, Moein, et al.
Published: (2026)
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023)
by: Chen, Lizhang, et al.
Published: (2023)
Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
by: Zhang, Yifei, et al.
Published: (2026)
by: Zhang, Yifei, et al.
Published: (2026)
Similar Items
-
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
by: Liu, Yixin, et al.
Published: (2026) -
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
by: Huang, Yuzhen, et al.
Published: (2025) -
Incentivizing LLMs to Self-Verify Their Answers
by: Zhang, Fuxiang, et al.
Published: (2025) -
Divide-Verify-Refine: Can LLMs Self-Align with Complex Instructions?
by: Zhang, Xianren, et al.
Published: (2024) -
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs
by: Wang, Ruoyu, et al.
Published: (2024)