Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Turksoy, Ramazan Tarik, Turkmen, Beyza
Format:	Preprint
Published:	2024
Subjects:	Information Retrieval
Online Access:	https://arxiv.org/abs/2406.18320
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916302056062976
author	Turksoy, Ramazan Tarik Turkmen, Beyza
author_facet	Turksoy, Ramazan Tarik Turkmen, Beyza
contents	Click-through rate (CTR) prediction is a crucial task in online advertising to recommend products that users are likely to be interested in. To identify the best-performing models, rigorous model evaluation is necessary. Offline experimentation plays a significant role in selecting models for live user-item interactions, despite the value of online experimentation like A/B testing, which has its own limitations and risks. Often, the correlation between offline performance metrics and actual online model performance is inadequate. One main reason for this discrepancy is the common practice of using random splits to create training, validation, and test datasets in CTR prediction. In contrast, real-world CTR prediction follows a temporal order. Therefore, the methodology used in offline evaluation, particularly the data splitting strategy, is crucial. This study aims to address the inconsistency between current offline evaluation methods and real-world use cases, by focusing on data splitting strategies. To examine the impact of different data split strategies on offline performance, we conduct extensive experiments using both random and temporal splits on a large open benchmark dataset, Criteo.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_18320
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	The Effects of Data Split Strategies on the Offline Experiments for CTR Prediction Turksoy, Ramazan Tarik Turkmen, Beyza Information Retrieval Click-through rate (CTR) prediction is a crucial task in online advertising to recommend products that users are likely to be interested in. To identify the best-performing models, rigorous model evaluation is necessary. Offline experimentation plays a significant role in selecting models for live user-item interactions, despite the value of online experimentation like A/B testing, which has its own limitations and risks. Often, the correlation between offline performance metrics and actual online model performance is inadequate. One main reason for this discrepancy is the common practice of using random splits to create training, validation, and test datasets in CTR prediction. In contrast, real-world CTR prediction follows a temporal order. Therefore, the methodology used in offline evaluation, particularly the data splitting strategy, is crucial. This study aims to address the inconsistency between current offline evaluation methods and real-world use cases, by focusing on data splitting strategies. To examine the impact of different data split strategies on offline performance, we conduct extensive experiments using both random and temporal splits on a large open benchmark dataset, Criteo.
title	The Effects of Data Split Strategies on the Offline Experiments for CTR Prediction
topic	Information Retrieval
url	https://arxiv.org/abs/2406.18320

Similar Items