Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chai, Zheng, Ren, Qin, Xiao, Xijun, Yang, Huizhi, Han, Bo, Zhang, Sijun, Chen, Di, Lu, Hui, Zhao, Wenlin, Yu, Lele, Xie, Xionghang, Ren, Shiru, Sun, Xiang, Tan, Yaocheng, Xu, Peng, Zheng, Yuchao, Wu, Di
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2505.04421
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866909694495293440
author Chai, Zheng
Ren, Qin
Xiao, Xijun
Yang, Huizhi
Han, Bo
Zhang, Sijun
Chen, Di
Lu, Hui
Zhao, Wenlin
Yu, Lele
Xie, Xionghang
Ren, Shiru
Sun, Xiang
Tan, Yaocheng
Xu, Peng
Zheng, Yuchao
Wu, Di
author_facet Chai, Zheng
Ren, Qin
Xiao, Xijun
Yang, Huizhi
Han, Bo
Zhang, Sijun
Chen, Di
Lu, Hui
Zhao, Wenlin
Yu, Lele
Xie, Xionghang
Ren, Shiru
Sun, Xiang
Tan, Yaocheng
Xu, Peng
Zheng, Yuchao
Wu, Di
contents Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.
format Preprint
id arxiv_https___arxiv_org_abs_2505_04421
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
Chai, Zheng
Ren, Qin
Xiao, Xijun
Yang, Huizhi
Han, Bo
Zhang, Sijun
Chen, Di
Lu, Hui
Zhao, Wenlin
Yu, Lele
Xie, Xionghang
Ren, Shiru
Sun, Xiang
Tan, Yaocheng
Xu, Peng
Zheng, Yuchao
Wu, Di
Information Retrieval
Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.
title LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
topic Information Retrieval
url https://arxiv.org/abs/2505.04421