Saved in:
Bibliographic Details
Main Authors: Wu, Junyu, Chang, Weiming, Liu, Xiaotao, He, Guanyou, Xian, Tingfeng, Hong, Haoqiang, Chen, Boqi, Tian, Hongtao, Yang, Tao, Shi, Yunsheng, Lin, Feng, Yao, Ting, Xu, Jiatao
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.07970
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918126374879232
author Wu, Junyu
Chang, Weiming
Liu, Xiaotao
He, Guanyou
Xian, Tingfeng
Hong, Haoqiang
Chen, Boqi
Tian, Hongtao
Yang, Tao
Shi, Yunsheng
Lin, Feng
Yao, Ting
Xu, Jiatao
author_facet Wu, Junyu
Chang, Weiming
Liu, Xiaotao
He, Guanyou
Xian, Tingfeng
Hong, Haoqiang
Chen, Boqi
Tian, Hongtao
Yang, Tao
Shi, Yunsheng
Lin, Feng
Yao, Ting
Xu, Jiatao
contents Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent paradigm for training large language models and multimodal systems. Despite the notable advances enabled by existing RLHF training frameworks, significant challenges remain to scale to complex multimodal workflows and adapt to dynamic workloads. In particular, current systems often encounter limitations related to controller scalability when managing large models, as well as inefficiencies in orchestrating intricate RLHF pipelines, especially in scenarios that require dynamic sampling and resource allocation. In this paper, we introduce WeChat-YATT Yet Another Transformer Trainer in WeChat, a simple, scalable, and balanced RLHF training framework specifically designed to address these challenges. WeChat-YATT features a parallel controller programming model that enables flexible and efficient orchestration of complex RLHF workflows, effectively mitigating bottlenecks associated with centralized controller architectures and facilitating scalability in large-scale data scenarios. In addition, we propose a dynamic placement schema that adaptively partitions computational resources and schedules workloads, thereby significantly reducing hardware idle time and improving GPU utilization under variable training conditions. We evaluate WeChat-YATT across diverse experimental scenarios, demonstrating its substantial throughput improvements over state-of-the-art RLHF training frameworks. Furthermore, WeChat-YATT has been successfully deployed to train models that support WeChat product features for a large-scale user base, underscoring its effectiveness and robustness in real-world applications. We have made WeChat-YATT publicly available at https://www.github.com/tencent/WeChat-YATT.
format Preprint
id arxiv_https___arxiv_org_abs_2508_07970
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
Wu, Junyu
Chang, Weiming
Liu, Xiaotao
He, Guanyou
Xian, Tingfeng
Hong, Haoqiang
Chen, Boqi
Tian, Hongtao
Yang, Tao
Shi, Yunsheng
Lin, Feng
Yao, Ting
Xu, Jiatao
Machine Learning
Artificial Intelligence
Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent paradigm for training large language models and multimodal systems. Despite the notable advances enabled by existing RLHF training frameworks, significant challenges remain to scale to complex multimodal workflows and adapt to dynamic workloads. In particular, current systems often encounter limitations related to controller scalability when managing large models, as well as inefficiencies in orchestrating intricate RLHF pipelines, especially in scenarios that require dynamic sampling and resource allocation. In this paper, we introduce WeChat-YATT Yet Another Transformer Trainer in WeChat, a simple, scalable, and balanced RLHF training framework specifically designed to address these challenges. WeChat-YATT features a parallel controller programming model that enables flexible and efficient orchestration of complex RLHF workflows, effectively mitigating bottlenecks associated with centralized controller architectures and facilitating scalability in large-scale data scenarios. In addition, we propose a dynamic placement schema that adaptively partitions computational resources and schedules workloads, thereby significantly reducing hardware idle time and improving GPU utilization under variable training conditions. We evaluate WeChat-YATT across diverse experimental scenarios, demonstrating its substantial throughput improvements over state-of-the-art RLHF training frameworks. Furthermore, WeChat-YATT has been successfully deployed to train models that support WeChat product features for a large-scale user base, underscoring its effectiveness and robustness in real-world applications. We have made WeChat-YATT publicly available at https://www.github.com/tencent/WeChat-YATT.
title WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2508.07970