Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cecchetti, Jacopo, Tonellotto, Nicola, Perego, Raffaele
Format:	Recurso digital
Language:
Published:	Zenodo 2024
Online Access:	https://doi.org/10.1145/3664190.3672513
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866902044359524352
author	Cecchetti, Jacopo Tonellotto, Nicola Perego, Raffaele
author_facet	Cecchetti, Jacopo Tonellotto, Nicola Perego, Raffaele
contents	<p>With the growing data privacy concerns, federated machine learning algorithms capable of preserving the confidentiality of sensitive information while enabling collaborative model training across decentralized data sources are attracting increasing interest. In this paper, we address the problem of collaboratively learning effective ranking models from non-independently and identically distributed (non-IID) training data owned by distinct search clients. We assume that the learning agents cannot access each other's data, and that the models learned from local datasets might be biased or underperforming due to a skewed distribution of certain document features or query topics in the learning-to-rank training data. Thus, we aim to instill in the local ranking model learned from local data the knowledge from other models to obtain a more robust ranker capable of effectively handling documents and queries underrepresented in the local collection. To achieve this, we explore different methods for merging the ranking models, thus obtaining in each client a model that excels in ranking documents from the local data distribution but also performs well on queries retrieving documents having distributions typical of a partner's node. In particular, our findings suggest that by relying on a linear combination of the local models, we can improve IR models effectiveness by up to +17.92% in NDCG@10 (moving from 0.619 to 0.730), and by up to +19.64% in MAP (moving from 0.713 to 0.853).</p>
format	Recurso digital
id	zenodo_https___doi_org_10_1145_3664190_3672513
institution	Zenodo
language
publishDate	2024
publisher	Zenodo
record_format	zenodo
spellingShingle	Learning to Rank for Non Independent and Identically Distributed Datasets Cecchetti, Jacopo Tonellotto, Nicola Perego, Raffaele <p>With the growing data privacy concerns, federated machine learning algorithms capable of preserving the confidentiality of sensitive information while enabling collaborative model training across decentralized data sources are attracting increasing interest. In this paper, we address the problem of collaboratively learning effective ranking models from non-independently and identically distributed (non-IID) training data owned by distinct search clients. We assume that the learning agents cannot access each other's data, and that the models learned from local datasets might be biased or underperforming due to a skewed distribution of certain document features or query topics in the learning-to-rank training data. Thus, we aim to instill in the local ranking model learned from local data the knowledge from other models to obtain a more robust ranker capable of effectively handling documents and queries underrepresented in the local collection. To achieve this, we explore different methods for merging the ranking models, thus obtaining in each client a model that excels in ranking documents from the local data distribution but also performs well on queries retrieving documents having distributions typical of a partner's node. In particular, our findings suggest that by relying on a linear combination of the local models, we can improve IR models effectiveness by up to +17.92% in NDCG@10 (moving from 0.619 to 0.730), and by up to +19.64% in MAP (moving from 0.713 to 0.853).</p>
title	Learning to Rank for Non Independent and Identically Distributed Datasets
url	https://doi.org/10.1145/3664190.3672513

Similar Items