:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Xiaoqian, Wang, Ke, Li, Yongbin, Wu, Yuchuan, Ma, Wentao, Kong, Aobo, Huang, Fei, Jiao, Jianbin, Zhang, Junge
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.12486
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Agentic Reinforcement Learning with Implicit Step Rewards
by: Liu, Xiaoqian, et al.
Published: (2025)

SDPO: Segment-Level Direct Preference Optimization for Social Agents
by: Kong, Aobo, et al.
Published: (2025)

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
by: Ye, Xinge, et al.
Published: (2025)

FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
by: Xiao, Ruixuan, et al.
Published: (2024)

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
by: Zhou, Qinhao, et al.
Published: (2024)

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
by: Xu, Wujiang, et al.
Published: (2025)

MOA: Multi-Objective Alignment for Role-Playing Agents
by: Liao, Chonghua, et al.
Published: (2025)

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges
by: Liu, Xiaoqian, et al.
Published: (2023)

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence
by: Hou, Guiyang, et al.
Published: (2025)

Reverse Preference Optimization for Complex Instruction Following
by: Huang, Xiang, et al.
Published: (2025)

Fine-Tuning Language Models with Reward Learning on Policy
by: Lang, Hao, et al.
Published: (2024)

Position: Foundation Agents as the Paradigm Shift for Decision Making
by: Liu, Xiaoqian, et al.
Published: (2024)

Improving Factual Consistency of News Summarization by Contrastive Preference Optimization
by: Feng, Huawen, et al.
Published: (2023)

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
by: Zhang, Xinghua, et al.
Published: (2024)

Adaptive Social Learning via Mode Policy Optimization for Language Agents
by: Wang, Minzheng, et al.
Published: (2025)

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents
by: Si, Shuzheng, et al.
Published: (2023)

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
by: Dong, Yihong, et al.
Published: (2025)

A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
by: Zhang, Junjie, et al.
Published: (2025)

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
by: Chen, Changyu, et al.
Published: (2024)

Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format
by: Wang, Dingzirui, et al.
Published: (2025)

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)

Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs
by: Cui, Jing, et al.
Published: (2025)

Calibration-Aware Policy Optimization for Reasoning LLMs
by: Wang, Ziqi, et al.
Published: (2026)

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning
by: Zhao, Zhengyang, et al.
Published: (2026)

Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning
by: Lin, Jiacheng, et al.
Published: (2025)

Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
by: Chen, Yuhan, et al.
Published: (2023)

HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
by: Deng, Ken, et al.
Published: (2025)

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
by: Feng, Jiazhan, et al.
Published: (2025)

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
by: Li, Zhuoqun, et al.
Published: (2024)

Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
by: Lu, Keer, et al.
Published: (2025)

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)

RLKD: Distilling LLMs' Reasoning via Reinforcement Learning
by: Xu, Shicheng, et al.
Published: (2025)

Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning
by: Zhao, Shiwan, et al.
Published: (2025)

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
by: Yang, Wanli, et al.
Published: (2026)

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
by: Zhang, Pinyi, et al.
Published: (2026)

The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends
by: Zhang, Xinghua, et al.
Published: (2024)

Debate Helps Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)

Selective Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)

Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
by: Lou, Xingzhou, et al.
Published: (2024)

MAPEX: A Multi-Agent Pipeline for Keyphrase Extraction
by: Zhang, Liting, et al.
Published: (2025)