:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ai, Rui, Pan, Yu, Simchi-Levi, David, Wang, Chonghuan
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.29871
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information
by: Ai, Rui, et al.
Published: (2025)

LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency
by: Li, Jiachun, et al.
Published: (2026)

OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents
by: Ao, Ruicheng, et al.
Published: (2026)

Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality
by: Chen, Shuze, et al.
Published: (2024)

DistShap: Scalable GNN Explanations with Distributed Shapley Values
by: Akkas, Selahattin, et al.
Published: (2025)

ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values
by: Bewley, Tom, et al.
Published: (2026)

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
by: Ao, Ruicheng, et al.
Published: (2026)

ShapG: new feature importance method based on the Shapley value
by: Zhao, Chi, et al.
Published: (2024)

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
by: Ao, Ruicheng, et al.
Published: (2025)

Large Language Models for Supply Chain Decisions
by: Simchi-Levi, David, et al.
Published: (2025)

What Matters in Data for DPO?
by: Pan, Yu, et al.
Published: (2025)

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)

Multi-agent Adaptive Mechanism Design
by: Han, Qiushi, et al.
Published: (2025)

GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
by: Wang, Zhijie
Published: (2026)

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)

Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
by: Hong, Haoyang, et al.
Published: (2025)

Beyond Covariance Matrix: The Statistical Complexity of Private Linear Regression
by: Chen, Fan, et al.
Published: (2025)

GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
by: Liu, Shuze Daniel, et al.
Published: (2026)

SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
by: Chen, Minghan, et al.
Published: (2025)

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)

When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO
by: Li, Yu, et al.
Published: (2026)

Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System
by: Li, Yanming, et al.
Published: (2026)

Designing Service Systems from Textual Evidence
by: Ao, Ruicheng, et al.
Published: (2026)

Owen-based Semantics and Hierarchy-Aware Explanation (O-Shap)
by: Zhou, Xiangyu, et al.
Published: (2026)

Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning
by: Yang, Pei, et al.
Published: (2025)

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)

Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)

Improving LLM-Generated Code Quality with GRPO
by: Robeyns, Maxime, et al.
Published: (2025)

SyntaxShap: Syntax-aware Explainability Method for Text Generation
by: Amara, Kenza, et al.
Published: (2024)

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)

Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
by: Pappone, Francesco, et al.
Published: (2025)

Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL with GRPO
by: Kattamuri, Ashish, et al.
Published: (2025)

RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)

GESA: Graph-Enhanced Semantic Allocation for Generalized, Fair, and Explainable Candidate-Role Matching
by: Shah, Rishi Ashish, et al.
Published: (2025)

iGRPO: Self-Feedback-Driven LLM Reasoning
by: Hatamizadeh, Ali, et al.
Published: (2026)

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
by: Feng, Xiachong, et al.
Published: (2026)

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
by: Huang, Runhui, et al.
Published: (2026)

GenAI vs. Human Creators: Procurement Mechanism Design in Two-/Three-Layer Markets
by: Ai, Rui, et al.
Published: (2025)

From Confounding to Learning: Dynamic Service Fee Pricing on Third-Party Platforms
by: Ai, Rui, et al.
Published: (2025)