Enregistré dans:
Détails bibliographiques
Auteurs principaux: Hong, Juhee, Liu, Meng, Wang, Shengzhi, Mao, Xiaoheng, Cheng, Huihui, Gao, Leon, Leung, Christopher, Zhou, Jin, Sekar, Chandra Mouli, Zhu, Zhao, Liu, Ruochen, Trieu, Tuan, Sun, Dawei, Kanjani, Jeet, Li, Rui, Qian, Jing, Cao, Xuan, Fan, Minjie, Gao, Mingze
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2511.21095
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866915638079913984
author Hong, Juhee
Liu, Meng
Wang, Shengzhi
Mao, Xiaoheng
Cheng, Huihui
Gao, Leon
Leung, Christopher
Zhou, Jin
Sekar, Chandra Mouli
Zhu, Zhao
Liu, Ruochen
Trieu, Tuan
Sun, Dawei
Kanjani, Jeet
Li, Rui
Qian, Jing
Cao, Xuan
Fan, Minjie
Gao, Mingze
author_facet Hong, Juhee
Liu, Meng
Wang, Shengzhi
Mao, Xiaoheng
Cheng, Huihui
Gao, Leon
Leung, Christopher
Zhou, Jin
Sekar, Chandra Mouli
Zhu, Zhao
Liu, Ruochen
Trieu, Tuan
Sun, Dawei
Kanjani, Jeet
Li, Rui
Qian, Jing
Cao, Xuan
Fan, Minjie
Gao, Mingze
contents Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling" approach, where independently learned user and item representations are only combined at the final layer. While efficient, this design is limited in effectiveness, as it struggles to capture fine-grained user-item affinities and cross-signals. To address these, we propose the Generative Early Stage Ranking (GESR) paradigm, introducing the Mixture of Attention (MoA) module which leverages diverse attention mechanisms to bridge the effectiveness gap: the Hard Matching Attention (HMA) module encodes explicit cross-signals by computing raw match counts between user and item features; the Target-Aware Self Attention module generates target-aware user representations conditioned on the item, enabling more personalized learning; and the Cross Attention modules facilitate early and more enriched interactions between user-item features. MoA's specialized attention encodings are further refined in the final layer through a Multi-Logit Parameterized Gating (MLPG) module, which integrates the newly learned embeddings via gating and produces secondary logits that are fused with the primary logit. To address the efficiency and latency challenges, we have introduced a comprehensive suite of optimization techniques. These span from custom kernels that maximize the capabilities of the latest hardware to efficient serving solutions powered by caching mechanisms. The proposed GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks, as validated by both offline and online experiments. To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.
format Preprint
id arxiv_https___arxiv_org_abs_2511_21095
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Generative Early Stage Ranking
Hong, Juhee
Liu, Meng
Wang, Shengzhi
Mao, Xiaoheng
Cheng, Huihui
Gao, Leon
Leung, Christopher
Zhou, Jin
Sekar, Chandra Mouli
Zhu, Zhao
Liu, Ruochen
Trieu, Tuan
Sun, Dawei
Kanjani, Jeet
Li, Rui
Qian, Jing
Cao, Xuan
Fan, Minjie
Gao, Mingze
Machine Learning
Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling" approach, where independently learned user and item representations are only combined at the final layer. While efficient, this design is limited in effectiveness, as it struggles to capture fine-grained user-item affinities and cross-signals. To address these, we propose the Generative Early Stage Ranking (GESR) paradigm, introducing the Mixture of Attention (MoA) module which leverages diverse attention mechanisms to bridge the effectiveness gap: the Hard Matching Attention (HMA) module encodes explicit cross-signals by computing raw match counts between user and item features; the Target-Aware Self Attention module generates target-aware user representations conditioned on the item, enabling more personalized learning; and the Cross Attention modules facilitate early and more enriched interactions between user-item features. MoA's specialized attention encodings are further refined in the final layer through a Multi-Logit Parameterized Gating (MLPG) module, which integrates the newly learned embeddings via gating and produces secondary logits that are fused with the primary logit. To address the efficiency and latency challenges, we have introduced a comprehensive suite of optimization techniques. These span from custom kernels that maximize the capabilities of the latest hardware to efficient serving solutions powered by caching mechanisms. The proposed GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks, as validated by both offline and online experiments. To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.
title Generative Early Stage Ranking
topic Machine Learning
url https://arxiv.org/abs/2511.21095