Saved in:
| Main Authors: | Ai, Rui, Pan, Yu, Simchi-Levi, David, Wang, Chonghuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.29871 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information
by: Ai, Rui, et al.
Published: (2025)
by: Ai, Rui, et al.
Published: (2025)
LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency
by: Li, Jiachun, et al.
Published: (2026)
by: Li, Jiachun, et al.
Published: (2026)
OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents
by: Ao, Ruicheng, et al.
Published: (2026)
by: Ao, Ruicheng, et al.
Published: (2026)
Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality
by: Chen, Shuze, et al.
Published: (2024)
by: Chen, Shuze, et al.
Published: (2024)
DistShap: Scalable GNN Explanations with Distributed Shapley Values
by: Akkas, Selahattin, et al.
Published: (2025)
by: Akkas, Selahattin, et al.
Published: (2025)
ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values
by: Bewley, Tom, et al.
Published: (2026)
by: Bewley, Tom, et al.
Published: (2026)
ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
by: Ao, Ruicheng, et al.
Published: (2026)
by: Ao, Ruicheng, et al.
Published: (2026)
ShapG: new feature importance method based on the Shapley value
by: Zhao, Chi, et al.
Published: (2024)
by: Zhao, Chi, et al.
Published: (2024)
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
by: Ao, Ruicheng, et al.
Published: (2025)
by: Ao, Ruicheng, et al.
Published: (2025)
Large Language Models for Supply Chain Decisions
by: Simchi-Levi, David, et al.
Published: (2025)
by: Simchi-Levi, David, et al.
Published: (2025)
What Matters in Data for DPO?
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)
by: Wei, Kangda, et al.
Published: (2026)
Multi-agent Adaptive Mechanism Design
by: Han, Qiushi, et al.
Published: (2025)
by: Han, Qiushi, et al.
Published: (2025)
GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
by: Wang, Zhijie
Published: (2026)
by: Wang, Zhijie
Published: (2026)
GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)
by: Tan, Hongze, et al.
Published: (2025)
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
by: Hong, Haoyang, et al.
Published: (2025)
by: Hong, Haoyang, et al.
Published: (2025)
Beyond Covariance Matrix: The Statistical Complexity of Private Linear Regression
by: Chen, Fan, et al.
Published: (2025)
by: Chen, Fan, et al.
Published: (2025)
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)
by: Sullivan, Michael, et al.
Published: (2025)
Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
by: Liu, Shuze Daniel, et al.
Published: (2026)
by: Liu, Shuze Daniel, et al.
Published: (2026)
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
by: Chen, Minghan, et al.
Published: (2025)
by: Chen, Minghan, et al.
Published: (2025)
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System
by: Li, Yanming, et al.
Published: (2026)
by: Li, Yanming, et al.
Published: (2026)
Designing Service Systems from Textual Evidence
by: Ao, Ruicheng, et al.
Published: (2026)
by: Ao, Ruicheng, et al.
Published: (2026)
Owen-based Semantics and Hierarchy-Aware Explanation (O-Shap)
by: Zhou, Xiangyu, et al.
Published: (2026)
by: Zhou, Xiangyu, et al.
Published: (2026)
Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)
by: Mansouri, Omar El, et al.
Published: (2025)
Improving LLM-Generated Code Quality with GRPO
by: Robeyns, Maxime, et al.
Published: (2025)
by: Robeyns, Maxime, et al.
Published: (2025)
SyntaxShap: Syntax-aware Explainability Method for Text Generation
by: Amara, Kenza, et al.
Published: (2024)
by: Amara, Kenza, et al.
Published: (2024)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
by: Pappone, Francesco, et al.
Published: (2025)
by: Pappone, Francesco, et al.
Published: (2025)
Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL with GRPO
by: Kattamuri, Ashish, et al.
Published: (2025)
by: Kattamuri, Ashish, et al.
Published: (2025)
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)
by: Zhong, Haitian, et al.
Published: (2026)
GESA: Graph-Enhanced Semantic Allocation for Generalized, Fair, and Explainable Candidate-Role Matching
by: Shah, Rishi Ashish, et al.
Published: (2025)
by: Shah, Rishi Ashish, et al.
Published: (2025)
iGRPO: Self-Feedback-Driven LLM Reasoning
by: Hatamizadeh, Ali, et al.
Published: (2026)
by: Hatamizadeh, Ali, et al.
Published: (2026)
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
by: Feng, Xiachong, et al.
Published: (2026)
by: Feng, Xiachong, et al.
Published: (2026)
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
by: Huang, Runhui, et al.
Published: (2026)
by: Huang, Runhui, et al.
Published: (2026)
GenAI vs. Human Creators: Procurement Mechanism Design in Two-/Three-Layer Markets
by: Ai, Rui, et al.
Published: (2025)
by: Ai, Rui, et al.
Published: (2025)
From Confounding to Learning: Dynamic Service Fee Pricing on Third-Party Platforms
by: Ai, Rui, et al.
Published: (2025)
by: Ai, Rui, et al.
Published: (2025)
Similar Items
-
Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information
by: Ai, Rui, et al.
Published: (2025) -
LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency
by: Li, Jiachun, et al.
Published: (2026) -
OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents
by: Ao, Ruicheng, et al.
Published: (2026) -
Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality
by: Chen, Shuze, et al.
Published: (2024) -
DistShap: Scalable GNN Explanations with Distributed Shapley Values
by: Akkas, Selahattin, et al.
Published: (2025)