Saved in:
| Main Authors: | Wu, Xiong Jun, Zhang, Zhenduo, Wen, ZuJie, Zhang, Zhiqiang, Ren, Wang, Shi, Lei, Chen, Cai, Zhao, Deng, Wang, Qing, Han, Xudong, Tang, Chengfu, Jin, Dingnan, Cui, Qing, Zhou, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.14147 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
by: Ling Team, et al.
Published: (2025)
by: Ling Team, et al.
Published: (2025)
DiffScore: Text Evaluation Beyond Autoregressive Likelihood
by: Lai, Wen, et al.
Published: (2026)
by: Lai, Wen, et al.
Published: (2026)
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
by: Tang, Xinyu, et al.
Published: (2025)
by: Tang, Xinyu, et al.
Published: (2025)
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards
by: Tang, Xinyu, et al.
Published: (2025)
by: Tang, Xinyu, et al.
Published: (2025)
ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation
by: Huang, Qing, et al.
Published: (2026)
by: Huang, Qing, et al.
Published: (2026)
On Representation Redundancy in Large-Scale Instruction Tuning Data Selection
by: Shu, Youwei, et al.
Published: (2026)
by: Shu, Youwei, et al.
Published: (2026)
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
by: Cheng, Xiaoxue, et al.
Published: (2025)
by: Cheng, Xiaoxue, et al.
Published: (2025)
SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis
by: Zhao, Bingxuan, et al.
Published: (2026)
by: Zhao, Bingxuan, et al.
Published: (2026)
Reasons to Reject? Aligning Language Models with Judgments
by: Xu, Weiwen, et al.
Published: (2023)
by: Xu, Weiwen, et al.
Published: (2023)
From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning
by: Zhang, Shihao, et al.
Published: (2026)
by: Zhang, Shihao, et al.
Published: (2026)
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree
by: Peng, Qi, et al.
Published: (2025)
by: Peng, Qi, et al.
Published: (2025)
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
by: Yang, Jinluan, et al.
Published: (2025)
by: Yang, Jinluan, et al.
Published: (2025)
What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code
by: Zhao, Yuze, et al.
Published: (2026)
by: Zhao, Yuze, et al.
Published: (2026)
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
by: Lei, Fangyu, et al.
Published: (2025)
by: Lei, Fangyu, et al.
Published: (2025)
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning
by: Chen, Haoyang, et al.
Published: (2026)
by: Chen, Haoyang, et al.
Published: (2026)
Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning
by: Cui, Jin, et al.
Published: (2026)
by: Cui, Jin, et al.
Published: (2026)
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)
by: Ling, Yiran, et al.
Published: (2026)
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
by: Deng, Hexuan, et al.
Published: (2025)
by: Deng, Hexuan, et al.
Published: (2025)
Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
by: Cai, Hua, et al.
Published: (2025)
by: Cai, Hua, et al.
Published: (2025)
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
by: Yan, Shaotian, et al.
Published: (2026)
by: Yan, Shaotian, et al.
Published: (2026)
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
by: Yan, Yuchen, et al.
Published: (2026)
by: Yan, Yuchen, et al.
Published: (2026)
Aligning Green Finance, Technological Innovation, and Energy Consumption for CO 2 Emission Reduction: Insights From BRICS + Countries
by: Chengfu Mu, et al.
Published: (2026)
by: Chengfu Mu, et al.
Published: (2026)
SURE: Semi-dense Uncertainty-REfined Feature Matching
by: Li, Sicheng, et al.
Published: (2026)
by: Li, Sicheng, et al.
Published: (2026)
Enhanced Electrochemical Performance of NiMn Layered Double Hydroxides/Graphene Oxide Composites Synthesized by One‐Step Hydrothermal Method for Supercapacitors
by: Jun Chen, et al.
Published: (2024)
by: Jun Chen, et al.
Published: (2024)
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Exploration of the Photoluminescence Behavior and Emission Mechanism of Thioester Polyacrylamide Tablets During the Gradual Increase of Molecular Weight
by: Qing Zhou, et al.
Published: (2024)
by: Qing Zhou, et al.
Published: (2024)
Continuity Reinforcement Skeleton for Pixel-based Haptic Display
by: Wang, Xinyuan, et al.
Published: (2024)
by: Wang, Xinyuan, et al.
Published: (2024)
Experimental Study on Mechanical Properties and Micromechanisms of Cement‐Improved Granite Residual Soil
by: Jun Xiong, et al.
Published: (2025)
by: Jun Xiong, et al.
Published: (2025)
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library
by: Wang, Jiapeng, et al.
Published: (2025)
by: Wang, Jiapeng, et al.
Published: (2025)
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
by: Chen, Huayu, et al.
Published: (2025)
by: Chen, Huayu, et al.
Published: (2025)
ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing
by: Li, Hengjia, et al.
Published: (2026)
by: Li, Hengjia, et al.
Published: (2026)
Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale
by: Acuna, David, et al.
Published: (2025)
by: Acuna, David, et al.
Published: (2025)
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
by: Li, Jiazheng, et al.
Published: (2025)
by: Li, Jiazheng, et al.
Published: (2025)
Measuring the tilt of primordial gravitational-wave power spectrum from observations
by: Li, Jun, et al.
Published: (2019)
by: Li, Jun, et al.
Published: (2019)
Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA
by: Shen, Xiangqing, et al.
Published: (2025)
by: Shen, Xiangqing, et al.
Published: (2025)
Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games
by: Fan, Mingyuan, et al.
Published: (2026)
by: Fan, Mingyuan, et al.
Published: (2026)
Realizing modular data from centers of near-group categories
by: Yu, Zhiqiang, et al.
Published: (2024)
by: Yu, Zhiqiang, et al.
Published: (2024)
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
DiffuReason: Bridging Latent Reasoning and Generative Refinement for Sequential Recommendation
by: Jiang, Jie, et al.
Published: (2026)
by: Jiang, Jie, et al.
Published: (2026)
Similar Items
-
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning
by: Wang, Hao, et al.
Published: (2026) -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
by: Ling Team, et al.
Published: (2025) -
DiffScore: Text Evaluation Beyond Autoregressive Likelihood
by: Lai, Wen, et al.
Published: (2026) -
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
by: Tang, Xinyu, et al.
Published: (2025) -
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards
by: Tang, Xinyu, et al.
Published: (2025)