Guardado en:
Detalles Bibliográficos
Autores principales: Pardoe, David, Daftary, Neil, Furtado, Miro, Aiyer, Aditya, Wang, Yu, Li, Liuqing, Song, Tao, Hertel, Lars, Yun, Young Jin, Radhakrishnan, Senthil, Wang, Zhiwei, Li, Tommy, Tran, Khai, Nagarajan, Ananth, Naqvi, Ali, Zhang, Yue, Fang, Renpeng, Romascanu, Avi, Kulothungun, Arjun, Kumar, Deepak, Boda, Praneeth, Borisyuk, Fedor, Wang, Ruoyan
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2602.11410
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866910019907223552
author Pardoe, David
Daftary, Neil
Furtado, Miro
Aiyer, Aditya
Wang, Yu
Li, Liuqing
Song, Tao
Hertel, Lars
Yun, Young Jin
Radhakrishnan, Senthil
Wang, Zhiwei
Li, Tommy
Tran, Khai
Nagarajan, Ananth
Naqvi, Ali
Zhang, Yue
Fang, Renpeng
Romascanu, Avi
Kulothungun, Arjun
Kumar, Deepak
Boda, Praneeth
Borisyuk, Fedor
Wang, Ruoyan
author_facet Pardoe, David
Daftary, Neil
Furtado, Miro
Aiyer, Aditya
Wang, Yu
Li, Liuqing
Song, Tao
Hertel, Lars
Yun, Young Jin
Radhakrishnan, Senthil
Wang, Zhiwei
Li, Tommy
Tran, Khai
Nagarajan, Ananth
Naqvi, Ali
Zhang, Yue
Fang, Renpeng
Romascanu, Avi
Kulothungun, Arjun
Kumar, Deepak
Boda, Praneeth
Borisyuk, Fedor
Wang, Ruoyan
contents Click-through rate (CTR) prediction is fundamental to online advertising systems. While Deep Learning Recommendation Models (DLRMs) with explicit feature interactions have long dominated this domain, recent advances in generative recommenders have shown promising results in content recommendation. However, adapting these transformer-based architectures to ads CTR prediction still presents unique challenges, including handling post-scoring contextual signals, maintaining offline-online consistency, and scaling to industrial workloads. We present CADET (Context-Conditioned Ads Decoder-Only Transformer), an end-to-end decoder-only transformer for ads CTR prediction deployed at LinkedIn. Our approach introduces several key innovations: (1) a context-conditioned decoding architecture with multi-tower prediction heads that explicitly model post-scoring signals such as ad position, resolving the chicken-and-egg problem between predicted CTR and ranking; (2) a self-gated attention mechanism that stabilizes training by adaptively regulating information flow at both representation and interaction levels; (3) a timestamp-based variant of Rotary Position Embedding (RoPE) that captures temporal relationships across timescales from seconds to months; (4) session masking strategies that prevent the model from learning dependencies on unavailable in-session events, addressing train-serve skew; and (5) production engineering techniques including tensor packing, sequence chunking, and custom Flash Attention kernels that enable efficient training and serving at scale. In online A/B testing, CADET achieves a 11.04\% CTR lift compared to the production LiRank baseline model, a hybrid ensemble of DCNv2 and sequential encoders. The system has been successfully deployed on LinkedIn's advertising platform, serving the main traffic for homefeed sponsored updates.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11410
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle CADET: Context-Conditioned Ads CTR Prediction With a Decoder-Only Transformer
Pardoe, David
Daftary, Neil
Furtado, Miro
Aiyer, Aditya
Wang, Yu
Li, Liuqing
Song, Tao
Hertel, Lars
Yun, Young Jin
Radhakrishnan, Senthil
Wang, Zhiwei
Li, Tommy
Tran, Khai
Nagarajan, Ananth
Naqvi, Ali
Zhang, Yue
Fang, Renpeng
Romascanu, Avi
Kulothungun, Arjun
Kumar, Deepak
Boda, Praneeth
Borisyuk, Fedor
Wang, Ruoyan
Machine Learning
Click-through rate (CTR) prediction is fundamental to online advertising systems. While Deep Learning Recommendation Models (DLRMs) with explicit feature interactions have long dominated this domain, recent advances in generative recommenders have shown promising results in content recommendation. However, adapting these transformer-based architectures to ads CTR prediction still presents unique challenges, including handling post-scoring contextual signals, maintaining offline-online consistency, and scaling to industrial workloads. We present CADET (Context-Conditioned Ads Decoder-Only Transformer), an end-to-end decoder-only transformer for ads CTR prediction deployed at LinkedIn. Our approach introduces several key innovations: (1) a context-conditioned decoding architecture with multi-tower prediction heads that explicitly model post-scoring signals such as ad position, resolving the chicken-and-egg problem between predicted CTR and ranking; (2) a self-gated attention mechanism that stabilizes training by adaptively regulating information flow at both representation and interaction levels; (3) a timestamp-based variant of Rotary Position Embedding (RoPE) that captures temporal relationships across timescales from seconds to months; (4) session masking strategies that prevent the model from learning dependencies on unavailable in-session events, addressing train-serve skew; and (5) production engineering techniques including tensor packing, sequence chunking, and custom Flash Attention kernels that enable efficient training and serving at scale. In online A/B testing, CADET achieves a 11.04\% CTR lift compared to the production LiRank baseline model, a hybrid ensemble of DCNv2 and sequential encoders. The system has been successfully deployed on LinkedIn's advertising platform, serving the main traffic for homefeed sponsored updates.
title CADET: Context-Conditioned Ads CTR Prediction With a Decoder-Only Transformer
topic Machine Learning
url https://arxiv.org/abs/2602.11410