:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Tong, Qian, Cheng, Cief, Matej, He, Yuan, Dan, Daniele, Aletras, Nikolaos, Kazai, Gabriella
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2606.00135
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bridging the Gap: From Ad-hoc to Proactive Search in Conversations
by: Meng, Chuan, et al.
Published: (2025)

Pessimistic Off-Policy Optimization for Learning to Rank
by: Cief, Matej, et al.
Published: (2022)

Learning Action Embeddings for Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2023)

Incorporating Attribution Importance for Improving Faithfulness Metrics
by: Zhao, Zhixue, et al.
Published: (2023)

Where does output diversity collapse in post-training?
by: Karouzos, Constantinos, et al.
Published: (2026)

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift
by: Karouzos, Constantinos, et al.
Published: (2026)

ToolRL: Reward is All Tool Learning Needs
by: Qian, Cheng, et al.
Published: (2025)

Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch
by: Zeng, Yirong, et al.
Published: (2025)

On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)

Cross-Validated Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2024)

Skill Reuse as Compression in Agentic RL
by: Xu, Zhikun, et al.
Published: (2026)

SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
by: Wang, Prince Zizhuang, et al.
Published: (2026)

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL
by: Mai, Shinji, et al.
Published: (2025)

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
by: He, Zelin, et al.
Published: (2026)

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
by: Zhoubian, Sining, et al.
Published: (2025)

DiRL: An Efficient Post-Training Framework for Diffusion Language Models
by: Zhu, Ying, et al.
Published: (2025)

RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure
by: Gao, Wei, et al.
Published: (2025)

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training
by: Hu, Yuelin, et al.
Published: (2026)

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
by: Li, Zhuofeng, et al.
Published: (2025)

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models
by: Villegas, Danae Sánchez, et al.
Published: (2026)

QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch
by: Gu, Hao, et al.
Published: (2026)

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL
by: Lu, Xingyu, et al.
Published: (2026)

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions
by: Gan, Guo, et al.
Published: (2026)

Supplement Generation Training for Enhancing Agentic Task Performance
by: Cho, Young Min, et al.
Published: (2026)

Topology-Aware Revival for Efficient Sparse Training
by: Jin, Meiling, et al.
Published: (2026)

Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail
by: Li, Yingru, et al.
Published: (2025)

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
by: Ye, Chenlu, et al.
Published: (2025)

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
by: Guo, Yiran, et al.
Published: (2025)

GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
by: Xu, Haofeng, et al.
Published: (2026)

What Limits Agentic Systems Efficiency?
by: Bian, Song, et al.
Published: (2025)

Agentic Critical Training
by: Liu, Weize, et al.
Published: (2026)

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
by: Qian, Cheng, et al.
Published: (2025)

Token-Efficient RL for LLM Reasoning
by: Lee, Alan, et al.
Published: (2025)

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
by: Dai, Weinan, et al.
Published: (2026)

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
by: Li, Conglong, et al.
Published: (2022)

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
by: Li, Yingru, et al.
Published: (2026)

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)

MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
by: Feng, Laingjun, et al.
Published: (2025)