Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Khrylchenko, Kirill, Baikalov, Vladimir, Makeev, Sergei, Matveev, Artem, Liamaev, Sergei
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval
Online Access:	https://arxiv.org/abs/2507.09331
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913960158035968
author	Khrylchenko, Kirill Baikalov, Vladimir Makeev, Sergei Matveev, Artem Liamaev, Sergei
author_facet	Khrylchenko, Kirill Baikalov, Vladimir Makeev, Sergei Matveev, Artem Liamaev, Sergei
contents	Two-tower neural networks are a popular architecture for the retrieval stage in recommender systems. These models are typically trained with a softmax loss over the item catalog. However, in web-scale settings, the item catalog is often prohibitively large, making full softmax infeasible. A common solution is sampled softmax, which approximates the full softmax using a small number of sampled negatives. One practical and widely adopted approach is to use in-batch negatives, where negatives are drawn from items in the current mini-batch. However, this introduces a bias: items that appear more frequently in the batch (i.e., popular items) are penalized more heavily. To mitigate this issue, a popular industry technique known as logQ correction adjusts the logits during training by subtracting the log-probability of an item appearing in the batch. This correction is derived by analyzing the bias in the gradient and applying importance sampling, effectively twice, using the in-batch distribution as a proposal distribution. While this approach improves model quality, it does not fully eliminate the bias. In this work, we revisit the derivation of logQ correction and show that it overlooks a subtle but important detail: the positive item in the denominator is not Monte Carlo-sampled - it is always present with probability 1. We propose a refined correction formula that accounts for this. Notably, our loss introduces an interpretable sample weight that reflects the model's uncertainty - the probability of misclassification under the current parameters. We evaluate our method on both public and proprietary datasets, demonstrating consistent improvements over the standard logQ correction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_09331
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Correcting the LogQ Correction: Revisiting Sampled Softmax for Large-Scale Retrieval Khrylchenko, Kirill Baikalov, Vladimir Makeev, Sergei Matveev, Artem Liamaev, Sergei Information Retrieval Two-tower neural networks are a popular architecture for the retrieval stage in recommender systems. These models are typically trained with a softmax loss over the item catalog. However, in web-scale settings, the item catalog is often prohibitively large, making full softmax infeasible. A common solution is sampled softmax, which approximates the full softmax using a small number of sampled negatives. One practical and widely adopted approach is to use in-batch negatives, where negatives are drawn from items in the current mini-batch. However, this introduces a bias: items that appear more frequently in the batch (i.e., popular items) are penalized more heavily. To mitigate this issue, a popular industry technique known as logQ correction adjusts the logits during training by subtracting the log-probability of an item appearing in the batch. This correction is derived by analyzing the bias in the gradient and applying importance sampling, effectively twice, using the in-batch distribution as a proposal distribution. While this approach improves model quality, it does not fully eliminate the bias. In this work, we revisit the derivation of logQ correction and show that it overlooks a subtle but important detail: the positive item in the denominator is not Monte Carlo-sampled - it is always present with probability 1. We propose a refined correction formula that accounts for this. Notably, our loss introduces an interpretable sample weight that reflects the model's uncertainty - the probability of misclassification under the current parameters. We evaluate our method on both public and proprietary datasets, demonstrating consistent improvements over the standard logQ correction.
title	Correcting the LogQ Correction: Revisiting Sampled Softmax for Large-Scale Retrieval
topic	Information Retrieval
url	https://arxiv.org/abs/2507.09331

Similar Items