Saved in:
Bibliographic Details
Main Authors: Xu, Songpei, Wang, Shijia, Guo, Da, Guo, Xianwen, Xiao, Qiang, Huang, Bin, Wu, Guanlin, Luo, Chuanjiang
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.09888
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912557140279296
author Xu, Songpei
Wang, Shijia
Guo, Da
Guo, Xianwen
Xiao, Qiang
Huang, Bin
Wu, Guanlin
Luo, Chuanjiang
author_facet Xu, Songpei
Wang, Shijia
Guo, Da
Guo, Xianwen
Xiao, Qiang
Huang, Bin
Wu, Guanlin
Luo, Chuanjiang
contents Transformer-based generative models have achieved remarkable success across domains with various scaling law manifestations. However, our extensive experiments reveal persistent challenges when applying Transformer to recommendation systems: (1) Transformer scaling is not ideal with increased computational resources, due to structural incompatibilities with recommendation-specific features such as multi-source data heterogeneity; (2) critical online inference latency constraints (tens of milliseconds) that intensify with longer user behavior sequences and growing computational demands. We propose Climber, an efficient recommendation framework comprising two synergistic components: the model architecture for efficient scaling and the co-designed acceleration techniques. Our proposed model adopts two core innovations: (1) multi-scale sequence extraction that achieves a time complexity reduction by a constant factor, enabling more efficient scaling with sequence length; (2) dynamic temperature modulation adapting attention distributions to the multi-scenario and multi-behavior patterns. Complemented by acceleration techniques, Climber achieves a 5.15$\times$ throughput gain without performance degradation by adopting a "single user, multiple item" batched processing and memory-efficient Key-Value caching. Comprehensive offline experiments on multiple datasets validate that Climber exhibits a more ideal scaling curve. To our knowledge, this is the first publicly documented framework where controlled model scaling drives continuous online metric growth (12.19\% overall lift) without prohibitive resource costs. Climber has been successfully deployed on Netease Cloud Music, one of China's largest music streaming platforms, serving tens of millions of users daily.
format Preprint
id arxiv_https___arxiv_org_abs_2502_09888
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Climber: Toward Efficient Scaling Laws for Large Recommendation Models
Xu, Songpei
Wang, Shijia
Guo, Da
Guo, Xianwen
Xiao, Qiang
Huang, Bin
Wu, Guanlin
Luo, Chuanjiang
Information Retrieval
Transformer-based generative models have achieved remarkable success across domains with various scaling law manifestations. However, our extensive experiments reveal persistent challenges when applying Transformer to recommendation systems: (1) Transformer scaling is not ideal with increased computational resources, due to structural incompatibilities with recommendation-specific features such as multi-source data heterogeneity; (2) critical online inference latency constraints (tens of milliseconds) that intensify with longer user behavior sequences and growing computational demands. We propose Climber, an efficient recommendation framework comprising two synergistic components: the model architecture for efficient scaling and the co-designed acceleration techniques. Our proposed model adopts two core innovations: (1) multi-scale sequence extraction that achieves a time complexity reduction by a constant factor, enabling more efficient scaling with sequence length; (2) dynamic temperature modulation adapting attention distributions to the multi-scenario and multi-behavior patterns. Complemented by acceleration techniques, Climber achieves a 5.15$\times$ throughput gain without performance degradation by adopting a "single user, multiple item" batched processing and memory-efficient Key-Value caching. Comprehensive offline experiments on multiple datasets validate that Climber exhibits a more ideal scaling curve. To our knowledge, this is the first publicly documented framework where controlled model scaling drives continuous online metric growth (12.19\% overall lift) without prohibitive resource costs. Climber has been successfully deployed on Netease Cloud Music, one of China's largest music streaming platforms, serving tens of millions of users daily.
title Climber: Toward Efficient Scaling Laws for Large Recommendation Models
topic Information Retrieval
url https://arxiv.org/abs/2502.09888