Saved in:
| Main Authors: | Wang, Jiakang, Liu, Runze, Cai, Qingpeng, Lin, Lei, Hu, Wenping, Li, Xiu, Zhang, Fuzheng, Zhou, Guorui, Gai, Kun, Pan, Ling |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.06062 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
by: Liu, Runze, et al.
Published: (2025)
by: Liu, Runze, et al.
Published: (2025)
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
by: Wang, Jiakang, et al.
Published: (2025)
by: Wang, Jiakang, et al.
Published: (2025)
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
by: Xue, Zhenghai, et al.
Published: (2023)
by: Xue, Zhenghai, et al.
Published: (2023)
Misallocation in the Chinese land market
by: Xuan Fei, et al.
Published: (2024)
by: Xuan Fei, et al.
Published: (2024)
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
by: Zhang, Jingyuan, et al.
Published: (2025)
by: Zhang, Jingyuan, et al.
Published: (2025)
Random Policy Evaluation Uncovers Policies of Generative Flow Networks
by: He, Haoran, et al.
Published: (2024)
by: He, Haoran, et al.
Published: (2024)
State Regularized Policy Optimization on Data with Dynamics Shift
by: Xue, Zhenghai, et al.
Published: (2023)
by: Xue, Zhenghai, et al.
Published: (2023)
Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning
by: Ji, Xingguang, et al.
Published: (2025)
by: Ji, Xingguang, et al.
Published: (2025)
Hierarchical Semantic RL: Tackling the Problem of Dynamic Action Space for RL-based Recommendations
by: Wang, Minmao, et al.
Published: (2025)
by: Wang, Minmao, et al.
Published: (2025)
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
by: Ou, Jiao, et al.
Published: (2024)
by: Ou, Jiao, et al.
Published: (2024)
ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou
by: Wang, Xu, et al.
Published: (2024)
by: Wang, Xu, et al.
Published: (2024)
MISS: Multi-Modal Tree Indexing and Searching with Lifelong Sequential Behavior for Retrieval Recommendation
by: Guo, Chengcheng, et al.
Published: (2025)
by: Guo, Chengcheng, et al.
Published: (2025)
Future Impact Decomposition in Request-level Recommendations
by: Wang, Xiaobei, et al.
Published: (2024)
by: Wang, Xiaobei, et al.
Published: (2024)
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention
by: Liu, Ziru, et al.
Published: (2024)
by: Liu, Ziru, et al.
Published: (2024)
AIS: Adaptive Importance Sampling for Quantized RL
by: Zhou, Jiajun, et al.
Published: (2026)
by: Zhou, Jiajun, et al.
Published: (2026)
Chaos and Misallocation under Price Controls
by: Albrecht, Brian C., et al.
Published: (2026)
by: Albrecht, Brian C., et al.
Published: (2026)
Symposium on Misallocation and Structural Transformation: Introduction
by: Tasso Adamopoulos, et al.
Published: (2024)
by: Tasso Adamopoulos, et al.
Published: (2024)
Production Function Estimation With Resource Misallocation
by: Shigang Li, et al.
Published: (2026)
by: Shigang Li, et al.
Published: (2026)
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
by: Wang, Shaobo, et al.
Published: (2026)
by: Wang, Shaobo, et al.
Published: (2026)
Bifurcated Generative Flow Networks
by: Li, Chunhui, et al.
Published: (2024)
by: Li, Chunhui, et al.
Published: (2024)
Video Object Segmentation with Dynamic Query Modulation
by: Zhou, Hantao, et al.
Published: (2024)
by: Zhou, Hantao, et al.
Published: (2024)
FIM: Frequency-Aware Multi-View Interest Modeling for Local-Life Service Recommendation
by: Wang, Guoquan, et al.
Published: (2025)
by: Wang, Guoquan, et al.
Published: (2025)
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
by: Ou, Jiao, et al.
Published: (2023)
by: Ou, Jiao, et al.
Published: (2023)
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
The Impact of New Digital Infrastructure on Resource Misallocation
by: Qunli Wang, et al.
Published: (2026)
by: Qunli Wang, et al.
Published: (2026)
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
by: Lin, Lei, et al.
Published: (2023)
by: Lin, Lei, et al.
Published: (2023)
The Cancellation Hypothesis in Critic-Free RL: From Outcome Rewards to Token Credits
by: Cheng, Tianhao, et al.
Published: (2026)
by: Cheng, Tianhao, et al.
Published: (2026)
CRM: Retrieval Model with Controllable Condition
by: Liu, Chi, et al.
Published: (2024)
by: Liu, Chi, et al.
Published: (2024)
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
by: Guo, Chengcheng, et al.
Published: (2026)
by: Guo, Chengcheng, et al.
Published: (2026)
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
by: Jia, Jian, et al.
Published: (2025)
by: Jia, Jian, et al.
Published: (2025)
Tournament-Based Performance Evaluation and Systematic Misallocation: Why Forced Ranking Systems Produce Random Outcomes
by: McEntire, Jeremy
Published: (2025)
by: McEntire, Jeremy
Published: (2025)
How Metro Expansion Influences Enterprise Labor Misallocation
by: Mengting Zhang, et al.
Published: (2026)
by: Mengting Zhang, et al.
Published: (2026)
Generative Auto-Bidding with Value-Guided Explorations
by: Gao, Jingtong, et al.
Published: (2025)
by: Gao, Jingtong, et al.
Published: (2025)
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
by: Sun, Yuchong, et al.
Published: (2023)
by: Sun, Yuchong, et al.
Published: (2023)
GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework
by: Sun, Yijia, et al.
Published: (2025)
by: Sun, Yijia, et al.
Published: (2025)
Similar Items
-
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
by: Liu, Runze, et al.
Published: (2025) -
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
by: Wang, Jiakang, et al.
Published: (2025) -
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025) -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025) -
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)