Saved in:
Bibliographic Details
Main Authors: Clémençon, Stephan, Irurozki, Ekhine
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.10640
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908827592425472
author Clémençon, Stephan
Irurozki, Ekhine
author_facet Clémençon, Stephan
Irurozki, Ekhine
contents In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of \textit{local ranking median}, we introduce the concept of \textit{consensus ranking distribution} ($\crd$), a sparse mixture model of Dirac masses on $\mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the popular Kendall $τ$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities, paving the way for the development of efficient learning methods that do not suffer from the lack of vector space structure on $\mathfrak{S}_n$. In particular, we propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data, from the Dirac mass at a Kemeny median at the root of the tree to the empirical ranking data distribution itself at the end of the tree's exhaustive growth. In addition to the theoretical arguments developed, the relevance of the algorithm is empirically supported by various numerical experiments.
format Preprint
id arxiv_https___arxiv_org_abs_2602_10640
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning
Clémençon, Stephan
Irurozki, Ekhine
Machine Learning
In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of \textit{local ranking median}, we introduce the concept of \textit{consensus ranking distribution} ($\crd$), a sparse mixture model of Dirac masses on $\mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the popular Kendall $τ$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities, paving the way for the development of efficient learning methods that do not suffer from the lack of vector space structure on $\mathfrak{S}_n$. In particular, we propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data, from the Dirac mass at a Kemeny median at the root of the tree to the empirical ranking data distribution itself at the end of the tree's exhaustive growth. In addition to the theoretical arguments developed, the relevance of the algorithm is empirically supported by various numerical experiments.
title Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning
topic Machine Learning
url https://arxiv.org/abs/2602.10640