Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Han, Zhenyu, You, Ansheng, Wang, Haibo, Luo, Kui, Yang, Guang, Shi, Wenqi, Chen, Menglong, Zhang, Sicheng, Lan, Zeshun, Deng, Chunshi, Ji, Huazhong, Liu, Wenjie, Huang, Yu, Zhang, Yixiang, Pan, Chenyi, Wang, Jing, Huang, Xin, Li, Chunsheng, Wu, Jianping
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.01663
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908430371913728
author	Han, Zhenyu You, Ansheng Wang, Haibo Luo, Kui Yang, Guang Shi, Wenqi Chen, Menglong Zhang, Sicheng Lan, Zeshun Deng, Chunshi Ji, Huazhong Liu, Wenjie Huang, Yu Zhang, Yixiang Pan, Chenyi Wang, Jing Huang, Xin Li, Chunsheng Wu, Jianping
author_facet	Han, Zhenyu You, Ansheng Wang, Haibo Luo, Kui Yang, Guang Shi, Wenqi Chen, Menglong Zhang, Sicheng Lan, Zeshun Deng, Chunshi Ji, Huazhong Liu, Wenjie Huang, Yu Zhang, Yixiang Pan, Chenyi Wang, Jing Huang, Xin Li, Chunsheng Wu, Jianping
contents	Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_01663
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Han, Zhenyu You, Ansheng Wang, Haibo Luo, Kui Yang, Guang Shi, Wenqi Chen, Menglong Zhang, Sicheng Lan, Zeshun Deng, Chunshi Ji, Huazhong Liu, Wenjie Huang, Yu Zhang, Yixiang Pan, Chenyi Wang, Jing Huang, Xin Li, Chunsheng Wu, Jianping Machine Learning Artificial Intelligence Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.
title	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2507.01663

Similar Items