Saved in:
Bibliographic Details
Main Authors: Li, Zixuan, Geng, Binzong, Xiong, Jing, He, Yong, Hu, Yuxuan, Chen, Jian, Chen, Dingwei, Chang, Xiyu, Zhang, Liang, Mo, Linjian, Li, Chengming, Yuan, Chuan, Sun, Zhenan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.03668
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918115465494528
author Li, Zixuan
Geng, Binzong
Xiong, Jing
He, Yong
Hu, Yuxuan
Chen, Jian
Chen, Dingwei
Chang, Xiyu
Zhang, Liang
Mo, Linjian
Li, Chengming
Yuan, Chuan
Sun, Zhenan
author_facet Li, Zixuan
Geng, Binzong
Xiong, Jing
He, Yong
Hu, Yuxuan
Chen, Jian
Chen, Dingwei
Chang, Xiyu
Zhang, Liang
Mo, Linjian
Li, Chengming
Yuan, Chuan
Sun, Zhenan
contents Click-Through Rate (CTR) prediction, a core task in recommendation systems, estimates user click likelihood using historical behavioral data. Modeling user behavior sequences as text to leverage Language Models (LMs) for this task has gained traction, owing to LMs' strong semantic understanding and contextual modeling capabilities. However, a critical structural gap exists: user behavior sequences consist of discrete actions connected by semantically empty separators, differing fundamentally from the coherent natural language in LM pre-training. This mismatch causes semantic fragmentation, where LM attention scatters across irrelevant tokens instead of focusing on meaningful behavior boundaries and inter-behavior relationships, degrading prediction performance. To address this, we propose $\textit{CTR-Sink}$, a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios. Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information. Specifically, we insert sink tokens between consecutive behaviors, incorporating recommendation-specific signals such as temporal distance to serve as stable attention sinks. To enhance generality, we design a two-stage training strategy that explicitly guides LM attention toward sink tokens and a attention sink mechanism that amplifies inter-sink dependencies to better capture behavioral correlations. Experiments on one industrial dataset and two open-source datasets (MovieLens, Kuairec), alongside visualization results, validate the method's effectiveness across scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2508_03668
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction
Li, Zixuan
Geng, Binzong
Xiong, Jing
He, Yong
Hu, Yuxuan
Chen, Jian
Chen, Dingwei
Chang, Xiyu
Zhang, Liang
Mo, Linjian
Li, Chengming
Yuan, Chuan
Sun, Zhenan
Computation and Language
Click-Through Rate (CTR) prediction, a core task in recommendation systems, estimates user click likelihood using historical behavioral data. Modeling user behavior sequences as text to leverage Language Models (LMs) for this task has gained traction, owing to LMs' strong semantic understanding and contextual modeling capabilities. However, a critical structural gap exists: user behavior sequences consist of discrete actions connected by semantically empty separators, differing fundamentally from the coherent natural language in LM pre-training. This mismatch causes semantic fragmentation, where LM attention scatters across irrelevant tokens instead of focusing on meaningful behavior boundaries and inter-behavior relationships, degrading prediction performance. To address this, we propose $\textit{CTR-Sink}$, a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios. Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information. Specifically, we insert sink tokens between consecutive behaviors, incorporating recommendation-specific signals such as temporal distance to serve as stable attention sinks. To enhance generality, we design a two-stage training strategy that explicitly guides LM attention toward sink tokens and a attention sink mechanism that amplifies inter-sink dependencies to better capture behavioral correlations. Experiments on one industrial dataset and two open-source datasets (MovieLens, Kuairec), alongside visualization results, validate the method's effectiveness across scenarios.
title CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction
topic Computation and Language
url https://arxiv.org/abs/2508.03668