:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Zihao, Wang, Xiaoxue, Bai, Ziwei, Su, Donghang, Wu, Bowen, Yu, Qun, Wang, Baoxun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.13592
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning
by: Feng, Zihao, et al.
Published: (2025)

RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward
by: Wang, Zongsheng, et al.
Published: (2025)

Interpersonal Memory Matters: A New Task for Proactive Dialogue Utilizing Conversational History
by: Wu, Bowen, et al.
Published: (2025)

LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
by: Zhao, Yi, et al.
Published: (2025)

GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection
by: Zhang, Shuguang, et al.
Published: (2026)

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)

Towards the Holographic Characteristic of LLMs for Efficient Short-text Generation
by: Qian, Shun, et al.
Published: (2026)

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
by: Jiang, Shuyang, et al.
Published: (2025)

Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation
by: Yang, Zixuan, et al.
Published: (2026)

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)

$λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
by: Wang, Yining, et al.
Published: (2025)

S-GRPO: Unified Post-Training for Large Vision-Language Models
by: Yan, Yuming, et al.
Published: (2026)

GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation
by: Kochar, Dimple Vijay, et al.
Published: (2026)

The Bidirectional Process Reward Model
by: Zhang, Lingyin, et al.
Published: (2025)

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
by: Deng, Jingcheng, et al.
Published: (2026)

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning
by: Chen, Xiwen, et al.
Published: (2025)

Learning to Explain: Prototype-Based Surrogate Models for LLM Classification
by: Wei, Bowen, et al.
Published: (2025)

Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
by: Pappone, Francesco, et al.
Published: (2025)

Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL with GRPO
by: Kattamuri, Ashish, et al.
Published: (2025)

Can GRPO Boost Complex Multimodal Table Understanding?
by: Kang, Xiaoqiang, et al.
Published: (2025)

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
by: Tang, Yixuan, et al.
Published: (2025)

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)

It Takes Two: Your GRPO Is Secretly DPO
by: Wu, Yihong, et al.
Published: (2025)

MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
by: Feng, Zhaopeng, et al.
Published: (2025)

Libra: Assessing and Improving Reward Model by Learning to Think
by: Zhou, Meng, et al.
Published: (2025)

Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents
by: Wang, Ziyi, et al.
Published: (2026)

Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation
by: Su, Hongyang, et al.
Published: (2026)

Bridging Thoughts and Words: Graph-Based Intent-Semantic Joint Learning for Fake News Detection
by: Wang, Zhengjia, et al.
Published: (2025)

GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
by: Dipta, Shubhashis Roy, et al.
Published: (2026)

ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding
by: Sun, Zhongxiang, et al.
Published: (2025)

Detecting Conversational Mental Manipulation with Intent-Aware Prompting
by: Ma, Jiayuan, et al.
Published: (2024)

Towards Understanding the Influence of Reward Margin on Preference Model Performance
by: Qin, Bowen, et al.
Published: (2024)

Multi-Reward GRPO Fine-Tuning for De-biasing Large Language Models: A Study Based on Chinese-Context Discrimination Data
by: Yixuan, Deng, et al.
Published: (2025)

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement
by: Cheng, Zihao, et al.
Published: (2024)

Advancing Interpretability in Text Classification through Prototype Learning
by: Wei, Bowen, et al.
Published: (2024)

M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
by: Bai, Bizhe, et al.
Published: (2025)

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection
by: Wang, Pei, et al.
Published: (2024)

Generate then Refine: Data Augmentation for Zero-shot Intent Detection
by: Lin, I-Fan, et al.
Published: (2024)

Intent-driven In-context Learning for Few-shot Dialogue State Tracking
by: Yi, Zihao, et al.
Published: (2024)

Intent Detection in the Age of LLMs
by: Arora, Gaurav, et al.
Published: (2024)