:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Jiakang, Liu, Runze, Cai, Qingpeng, Lin, Lei, Hu, Wenping, Li, Xiu, Zhang, Fuzheng, Zhou, Guorui, Gai, Kun, Pan, Ling
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2510.06062
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
by: Liu, Runze, et al.
Published: (2025)

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
by: Wang, Jiakang, et al.
Published: (2025)

CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025)

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025)

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)

AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
by: Xue, Zhenghai, et al.
Published: (2023)

Misallocation in the Chinese land market
by: Xuan Fei, et al.
Published: (2024)

Leanabell-Prover: Posttraining Scaling in Formal Reasoning
by: Zhang, Jingyuan, et al.
Published: (2025)

Random Policy Evaluation Uncovers Policies of Generative Flow Networks
by: He, Haoran, et al.
Published: (2024)

State Regularized Policy Optimization on Data with Dynamics Shift
by: Xue, Zhenghai, et al.
Published: (2023)

Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning
by: Ji, Xingguang, et al.
Published: (2025)

Hierarchical Semantic RL: Tackling the Problem of Dynamic Action Space for RL-based Recommendations
by: Wang, Minmao, et al.
Published: (2025)

Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
by: Ou, Jiao, et al.
Published: (2024)

ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
by: Tang, Yihong, et al.
Published: (2024)

Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
by: Tang, Yihong, et al.
Published: (2024)

HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou
by: Wang, Xu, et al.
Published: (2024)

MISS: Multi-Modal Tree Indexing and Searching with Lifelong Sequential Behavior for Retrieval Recommendation
by: Guo, Chengcheng, et al.
Published: (2025)

Future Impact Decomposition in Request-level Recommendations
by: Wang, Xiaobei, et al.
Published: (2024)

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention
by: Liu, Ziru, et al.
Published: (2024)

AIS: Adaptive Importance Sampling for Quantized RL
by: Zhou, Jiajun, et al.
Published: (2026)

Chaos and Misallocation under Price Controls
by: Albrecht, Brian C., et al.
Published: (2026)

Symposium on Misallocation and Structural Transformation: Introduction
by: Tasso Adamopoulos, et al.
Published: (2024)

Production Function Estimation With Resource Misallocation
by: Shigang Li, et al.
Published: (2026)

DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
by: Wang, Shaobo, et al.
Published: (2026)

Bifurcated Generative Flow Networks
by: Li, Chunhui, et al.
Published: (2024)

Video Object Segmentation with Dynamic Query Modulation
by: Zhou, Hantao, et al.
Published: (2024)

FIM: Frequency-Aware Multi-View Interest Modeling for Local-Life Service Recommendation
by: Wang, Guoquan, et al.
Published: (2025)

DialogBench: Evaluating LLMs as Human-like Dialogue Systems
by: Ou, Jiao, et al.
Published: (2023)

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)

The Impact of New Digital Infrastructure on Resource Misallocation
by: Qunli Wang, et al.
Published: (2026)

Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
by: Lin, Lei, et al.
Published: (2023)

The Cancellation Hypothesis in Critic-Free RL: From Outcome Rewards to Token Credits
by: Cheng, Tianhao, et al.
Published: (2026)

CRM: Retrieval Model with Controllable Condition
by: Liu, Chi, et al.
Published: (2024)

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
by: Guo, Chengcheng, et al.
Published: (2026)

From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
by: Jia, Jian, et al.
Published: (2025)

Tournament-Based Performance Evaluation and Systematic Misallocation: Why Forced Ranking Systems Produce Random Outcomes
by: McEntire, Jeremy
Published: (2025)

How Metro Expansion Influences Enterprise Labor Misallocation
by: Mengting Zhang, et al.
Published: (2026)

Generative Auto-Bidding with Value-Guided Explorations
by: Gao, Jingtong, et al.
Published: (2025)

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
by: Sun, Yuchong, et al.
Published: (2023)

GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework
by: Sun, Yijia, et al.
Published: (2025)