Saved in:
Bibliographic Details
Main Authors: Chai, Zheng, Ren, Qin, Xiao, Xijun, Yang, Huizhi, Han, Bo, Zhang, Sijun, Chen, Di, Lu, Hui, Zhao, Wenlin, Yu, Lele, Xie, Xionghang, Ren, Shiru, Sun, Xiang, Tan, Yaocheng, Xu, Peng, Zheng, Yuchao, Wu, Di
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.04421
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.