Saved in:
| Main Authors: | Han, Kevin, Zhou, Yuhang, Gao, Mingze, Zhou, Gedi, Li, Serena, Kumar, Abhishek, Fan, Xiangjun, Li, Weiwei, Zhang, Lizhu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.05165 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLM-Driven Reasoning for Constraint-Aware Feature Selection in Industrial Systems
by: Zhou, Yuhang, et al.
Published: (2026)
by: Zhou, Yuhang, et al.
Published: (2026)
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
by: Zhou, Yuhang, et al.
Published: (2025)
by: Zhou, Yuhang, et al.
Published: (2025)
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
by: Zhou, Yuhang, et al.
Published: (2026)
by: Zhou, Yuhang, et al.
Published: (2026)
Synthetic Sandbox for Training Machine Learning Engineering Agents
by: Zhou, Yuhang, et al.
Published: (2026)
by: Zhou, Yuhang, et al.
Published: (2026)
Group Sequence Policy Optimization
by: Zheng, Chujie, et al.
Published: (2025)
by: Zheng, Chujie, et al.
Published: (2025)
Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization
by: Cang, Yueyang, et al.
Published: (2026)
by: Cang, Yueyang, et al.
Published: (2026)
GEM: Empowering LLM for both Embedding Generation and Language Understanding
by: Zhang, Caojin, et al.
Published: (2025)
by: Zhang, Caojin, et al.
Published: (2025)
Training-Free Group Relative Policy Optimization
by: Cai, Yuzheng, et al.
Published: (2025)
by: Cai, Yuzheng, et al.
Published: (2025)
Agentic Recommender System with Hierarchical Belief-State Memory
by: Shen, Xiang, et al.
Published: (2026)
by: Shen, Xiang, et al.
Published: (2026)
On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
GTPO: Stabilizing Group Relative Policy Optimization via Gradient and Entropy Control
by: Simoni, Marco, et al.
Published: (2025)
by: Simoni, Marco, et al.
Published: (2025)
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
by: Hu, Shijing, et al.
Published: (2025)
by: Hu, Shijing, et al.
Published: (2025)
Constrained Group Relative Policy Optimization
by: Girgis, Roger, et al.
Published: (2026)
by: Girgis, Roger, et al.
Published: (2026)
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
TARo: Token-level Adaptive Routing for LLM Test-time Alignment
by: Rai, Arushi, et al.
Published: (2026)
by: Rai, Arushi, et al.
Published: (2026)
LIONs: An Empirically Optimized Approach to Align Language Models
by: Yu, Xiao, et al.
Published: (2024)
by: Yu, Xiao, et al.
Published: (2024)
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
by: Deng, Jingcheng, et al.
Published: (2026)
by: Deng, Jingcheng, et al.
Published: (2026)
From Text to Insight: Leveraging Large Language Models for Performance Evaluation in Management
by: Li, Ning, et al.
Published: (2024)
by: Li, Ning, et al.
Published: (2024)
Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
by: Wang, Jialu, et al.
Published: (2026)
by: Wang, Jialu, et al.
Published: (2026)
Token-Level LLM Collaboration via FusionRoute
by: Xiong, Nuoya, et al.
Published: (2026)
by: Xiong, Nuoya, et al.
Published: (2026)
S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning
by: Zeng, Hanqing, et al.
Published: (2025)
by: Zeng, Hanqing, et al.
Published: (2025)
Agentic Policy Optimization via Instruction-Policy Co-Evolution
by: Zhou, Han, et al.
Published: (2025)
by: Zhou, Han, et al.
Published: (2025)
Stabilizing Policy Optimization via Logits Convexity
by: Chen, Hongzhan, et al.
Published: (2026)
by: Chen, Hongzhan, et al.
Published: (2026)
Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering?
by: Akarajaradwong, Pawitsapak, et al.
Published: (2025)
by: Akarajaradwong, Pawitsapak, et al.
Published: (2025)
The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations
by: Zhou, Fan
Published: (2024)
by: Zhou, Fan
Published: (2024)
ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation
by: Wang, Zhebo, et al.
Published: (2026)
by: Wang, Zhebo, et al.
Published: (2026)
Decompiling Rust: An Empirical Study of Compiler Optimizations and Reverse Engineering Challenges
by: Zhou, Zixu
Published: (2025)
by: Zhou, Zixu
Published: (2025)
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
by: Li, Chen, et al.
Published: (2025)
by: Li, Chen, et al.
Published: (2025)
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization
by: Simoni, Marco, et al.
Published: (2025)
by: Simoni, Marco, et al.
Published: (2025)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning
by: Yu, Hao, et al.
Published: (2025)
by: Yu, Hao, et al.
Published: (2025)
Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization
by: Ding, Yifeng, et al.
Published: (2025)
by: Ding, Yifeng, et al.
Published: (2025)
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
Recursive Agent Optimization
by: Gandhi, Apurva, et al.
Published: (2026)
by: Gandhi, Apurva, et al.
Published: (2026)
CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents
by: Tang, Yihong, et al.
Published: (2026)
by: Tang, Yihong, et al.
Published: (2026)
RePO: Replay-Enhanced Policy Optimization
by: Li, Siheng, et al.
Published: (2025)
by: Li, Siheng, et al.
Published: (2025)
A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
by: Chen, Dingwei, et al.
Published: (2026)
by: Chen, Dingwei, et al.
Published: (2026)
ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture
by: Mingze, Xu
Published: (2026)
by: Mingze, Xu
Published: (2026)
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)
by: Zhong, Haitian, et al.
Published: (2026)
Similar Items
-
LLM-Driven Reasoning for Constraint-Aware Feature Selection in Industrial Systems
by: Zhou, Yuhang, et al.
Published: (2026) -
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
by: Zhou, Yuhang, et al.
Published: (2025) -
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
by: Zhou, Yuhang, et al.
Published: (2026) -
Synthetic Sandbox for Training Machine Learning Engineering Agents
by: Zhou, Yuhang, et al.
Published: (2026) -
Group Sequence Policy Optimization
by: Zheng, Chujie, et al.
Published: (2025)