Saved in:
| Main Authors: | Peng, Kun, Tan, Conghui, Liu, Yu, Tang, Guohua, Sun, Zhongqian, Yang, Wei, Zhu, Zining, Jiang, Lei, Liu, Yanbing, Peng, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08533 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
by: Liao, Yi, et al.
Published: (2025)
by: Liao, Yi, et al.
Published: (2025)
Dialogues Aspect-based Sentiment Quadruple Extraction via Structural Entropy Minimization Partitioning
by: Peng, Kun, et al.
Published: (2025)
by: Peng, Kun, et al.
Published: (2025)
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
by: Sui, Yuan, et al.
Published: (2025)
by: Sui, Yuan, et al.
Published: (2025)
Emotion Transfer with Enhanced Prototype for Unseen Emotion Recognition in Conversation
by: Peng, Kun, et al.
Published: (2025)
by: Peng, Kun, et al.
Published: (2025)
T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction
by: Peng, Kun, et al.
Published: (2025)
by: Peng, Kun, et al.
Published: (2025)
Superior energy storage performance in NaNbO 3 ‐based lead‐free ceramics under low electric field
by: Kun Liu, et al.
Published: (2024)
by: Kun Liu, et al.
Published: (2024)
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)
by: Zhong, Haitian, et al.
Published: (2026)
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search
by: Sun, Linzhuang, et al.
Published: (2024)
by: Sun, Linzhuang, et al.
Published: (2024)
DanceGRPO: Unleashing GRPO on Visual Generation
by: Xue, Zeyue, et al.
Published: (2025)
by: Xue, Zeyue, et al.
Published: (2025)
Analysis of minimum orbital periods around d-dimensional charged black holes
by: Peng, Yan, et al.
Published: (2025)
by: Peng, Yan, et al.
Published: (2025)
Upper bound on the radius of the innermost photonsphere in the regular compact star spacetime
by: Liu, Guohua, et al.
Published: (2024)
by: Liu, Guohua, et al.
Published: (2024)
Bounds on the minimum orbital periods of non-singular Hayward and Bardeen black holes
by: Liu, Guohua, et al.
Published: (2025)
by: Liu, Guohua, et al.
Published: (2025)
MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
by: Guo, Diandian, et al.
Published: (2026)
by: Guo, Diandian, et al.
Published: (2026)
Large Language Models as Agents in Two-Player Games
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
EMIT: Enhancing MLLMs for Industrial Anomaly Detection via Difficulty-Aware GRPO
by: Guan, Wei, et al.
Published: (2025)
by: Guan, Wei, et al.
Published: (2025)
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
by: Hong, Haoyang, et al.
Published: (2025)
by: Hong, Haoyang, et al.
Published: (2025)
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
by: Wu, Xiongbin, et al.
Published: (2026)
by: Wu, Xiongbin, et al.
Published: (2026)
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
by: Zheng, Zhi, et al.
Published: (2025)
by: Zheng, Zhi, et al.
Published: (2025)
Adaptive Content Restriction for Large Language Models via Suffix Optimization
by: Li, Yige, et al.
Published: (2025)
by: Li, Yige, et al.
Published: (2025)
TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Guided Optimization
by: Jing, Xuepeng, et al.
Published: (2026)
by: Jing, Xuepeng, et al.
Published: (2026)
IRPO: Boosting Image Restoration via Post-training GRPO
by: Xu, Haoxuan, et al.
Published: (2025)
by: Xu, Haoxuan, et al.
Published: (2025)
LinguaGame: A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation
by: Ye, Yuxiao, et al.
Published: (2026)
by: Ye, Yuxiao, et al.
Published: (2026)
Trans-RAG: Query-Centric Vector Transformation for Secure Cross-Organizational Retrieval
by: Liu, Yu, et al.
Published: (2026)
by: Liu, Yu, et al.
Published: (2026)
Table-Filling via Mean Teacher for Cross-domain Aspect Sentiment Triplet Extraction
by: Peng, Kun, et al.
Published: (2024)
by: Peng, Kun, et al.
Published: (2024)
PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering
by: Liu, Yu, et al.
Published: (2026)
by: Liu, Yu, et al.
Published: (2026)
PerPO: Perceptual Preference Optimization via Discriminative Rewarding
by: Zhu, Zining, et al.
Published: (2025)
by: Zhu, Zining, et al.
Published: (2025)
AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
by: Cai, Yuzhu, et al.
Published: (2026)
by: Cai, Yuzhu, et al.
Published: (2026)
A Complete Mental Temporal Logic for Intelligent Agent
by: Cao, Zining
Published: (2025)
by: Cao, Zining
Published: (2025)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
Design and Optimization of Reinforcement Learning-Based Agents in Text-Based Games
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization
by: Cang, Yueyang, et al.
Published: (2026)
by: Cang, Yueyang, et al.
Published: (2026)
Semantic Reformulation Entropy for Robust Hallucination Detection in QA Tasks
by: Tong, Chaodong, et al.
Published: (2025)
by: Tong, Chaodong, et al.
Published: (2025)
OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint
by: Yang, Zhaoqilin, et al.
Published: (2025)
by: Yang, Zhaoqilin, et al.
Published: (2025)
Delay-Aware Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control with Model-based Stability Enhancement
by: Liu, Jiaqi, et al.
Published: (2024)
by: Liu, Jiaqi, et al.
Published: (2024)
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
by: Fu, Zhongqian, et al.
Published: (2025)
by: Fu, Zhongqian, et al.
Published: (2025)
DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO
by: Liu, Henglin, et al.
Published: (2025)
by: Liu, Henglin, et al.
Published: (2025)
SI‐FloatDet: A Visual Inspection Method for Water Surface Cleaning Robots Based on Shallow Information Injection and Adaptive Spatial Refinement
by: Guohua Yu, et al.
Published: (2025)
by: Guohua Yu, et al.
Published: (2025)
LithoGRPO: Fast Inverse Lithography via GRPO Reinforced Flow Matching
by: Lai, Yao, et al.
Published: (2026)
by: Lai, Yao, et al.
Published: (2026)
Knowledge Dependency Estimation for Reliable Question Answering
by: Tong, Chaodong, et al.
Published: (2026)
by: Tong, Chaodong, et al.
Published: (2026)
Similar Items
-
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
by: Liao, Yi, et al.
Published: (2025) -
Dialogues Aspect-based Sentiment Quadruple Extraction via Structural Entropy Minimization Partitioning
by: Peng, Kun, et al.
Published: (2025) -
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
by: Sui, Yuan, et al.
Published: (2025) -
Emotion Transfer with Enhanced Prototype for Unseen Emotion Recognition in Conversation
by: Peng, Kun, et al.
Published: (2025) -
T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction
by: Peng, Kun, et al.
Published: (2025)