Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Zixuan, Geng, Binzong, Xiong, Jing, He, Yong, Hu, Yuxuan, Chen, Jian, Chen, Dingwei, Chang, Xiyu, Zhang, Liang, Mo, Linjian, Li, Chengming, Yuan, Chuan, Sun, Zhenan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2508.03668
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918115465494528
author	Li, Zixuan Geng, Binzong Xiong, Jing He, Yong Hu, Yuxuan Chen, Jian Chen, Dingwei Chang, Xiyu Zhang, Liang Mo, Linjian Li, Chengming Yuan, Chuan Sun, Zhenan
author_facet	Li, Zixuan Geng, Binzong Xiong, Jing He, Yong Hu, Yuxuan Chen, Jian Chen, Dingwei Chang, Xiyu Zhang, Liang Mo, Linjian Li, Chengming Yuan, Chuan Sun, Zhenan
contents	Click-Through Rate (CTR) prediction, a core task in recommendation systems, estimates user click likelihood using historical behavioral data. Modeling user behavior sequences as text to leverage Language Models (LMs) for this task has gained traction, owing to LMs' strong semantic understanding and contextual modeling capabilities. However, a critical structural gap exists: user behavior sequences consist of discrete actions connected by semantically empty separators, differing fundamentally from the coherent natural language in LM pre-training. This mismatch causes semantic fragmentation, where LM attention scatters across irrelevant tokens instead of focusing on meaningful behavior boundaries and inter-behavior relationships, degrading prediction performance. To address this, we propose $\textit{CTR-Sink}$, a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios. Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information. Specifically, we insert sink tokens between consecutive behaviors, incorporating recommendation-specific signals such as temporal distance to serve as stable attention sinks. To enhance generality, we design a two-stage training strategy that explicitly guides LM attention toward sink tokens and a attention sink mechanism that amplifies inter-sink dependencies to better capture behavioral correlations. Experiments on one industrial dataset and two open-source datasets (MovieLens, Kuairec), alongside visualization results, validate the method's effectiveness across scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_03668
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction Li, Zixuan Geng, Binzong Xiong, Jing He, Yong Hu, Yuxuan Chen, Jian Chen, Dingwei Chang, Xiyu Zhang, Liang Mo, Linjian Li, Chengming Yuan, Chuan Sun, Zhenan Computation and Language Click-Through Rate (CTR) prediction, a core task in recommendation systems, estimates user click likelihood using historical behavioral data. Modeling user behavior sequences as text to leverage Language Models (LMs) for this task has gained traction, owing to LMs' strong semantic understanding and contextual modeling capabilities. However, a critical structural gap exists: user behavior sequences consist of discrete actions connected by semantically empty separators, differing fundamentally from the coherent natural language in LM pre-training. This mismatch causes semantic fragmentation, where LM attention scatters across irrelevant tokens instead of focusing on meaningful behavior boundaries and inter-behavior relationships, degrading prediction performance. To address this, we propose $\textit{CTR-Sink}$, a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios. Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information. Specifically, we insert sink tokens between consecutive behaviors, incorporating recommendation-specific signals such as temporal distance to serve as stable attention sinks. To enhance generality, we design a two-stage training strategy that explicitly guides LM attention toward sink tokens and a attention sink mechanism that amplifies inter-sink dependencies to better capture behavioral correlations. Experiments on one industrial dataset and two open-source datasets (MovieLens, Kuairec), alongside visualization results, validate the method's effectiveness across scenarios.
title	CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction
topic	Computation and Language
url	https://arxiv.org/abs/2508.03668

Similar Items