Saved in:
Bibliographic Details
Main Authors: Cecchetti, Jacopo, Tonellotto, Nicola, Perego, Raffaele
Format: Recurso digital
Language:
Published: Zenodo 2024
Online Access:https://doi.org/10.1145/3664190.3672513
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866902044359524352
author Cecchetti, Jacopo
Tonellotto, Nicola
Perego, Raffaele
author_facet Cecchetti, Jacopo
Tonellotto, Nicola
Perego, Raffaele
contents <p>With the growing data privacy concerns, federated machine learning algorithms capable of preserving the confidentiality of sensitive information while enabling collaborative model training across decentralized data sources are attracting increasing interest. In this paper, we address the problem of collaboratively learning effective ranking models from non-independently and identically distributed (non-IID) training data owned by distinct search clients. We assume that the learning agents cannot access each other's data, and that the models learned from local datasets might be biased or underperforming due to a skewed distribution of certain document features or query topics in the learning-to-rank training data. Thus, we aim to instill in the local ranking model learned from local data the knowledge from other models to obtain a more robust ranker capable of effectively handling documents and queries underrepresented in the local collection. To achieve this, we explore different methods for merging the ranking models, thus obtaining in each client a model that excels in ranking documents from the local data distribution but also performs well on queries retrieving documents having distributions typical of a partner's node. In particular, our findings suggest that by relying on a linear combination of the local models, we can improve IR models effectiveness by up to +17.92% in NDCG@10 (moving from 0.619 to 0.730), and by up to +19.64% in MAP (moving from 0.713 to 0.853).</p>
format Recurso digital
id zenodo_https___doi_org_10_1145_3664190_3672513
institution Zenodo
language
publishDate 2024
publisher Zenodo
record_format zenodo
spellingShingle Learning to Rank for Non Independent and Identically Distributed Datasets
Cecchetti, Jacopo
Tonellotto, Nicola
Perego, Raffaele
<p>With the growing data privacy concerns, federated machine learning algorithms capable of preserving the confidentiality of sensitive information while enabling collaborative model training across decentralized data sources are attracting increasing interest. In this paper, we address the problem of collaboratively learning effective ranking models from non-independently and identically distributed (non-IID) training data owned by distinct search clients. We assume that the learning agents cannot access each other's data, and that the models learned from local datasets might be biased or underperforming due to a skewed distribution of certain document features or query topics in the learning-to-rank training data. Thus, we aim to instill in the local ranking model learned from local data the knowledge from other models to obtain a more robust ranker capable of effectively handling documents and queries underrepresented in the local collection. To achieve this, we explore different methods for merging the ranking models, thus obtaining in each client a model that excels in ranking documents from the local data distribution but also performs well on queries retrieving documents having distributions typical of a partner's node. In particular, our findings suggest that by relying on a linear combination of the local models, we can improve IR models effectiveness by up to +17.92% in NDCG@10 (moving from 0.619 to 0.730), and by up to +19.64% in MAP (moving from 0.713 to 0.853).</p>
title Learning to Rank for Non Independent and Identically Distributed Datasets
url https://doi.org/10.1145/3664190.3672513