Saved in:
Bibliographic Details
Main Authors: Harris, David G., Pensyl, Thomas, Srinivasan, Aravind, Trinh, Khoa
Format: Preprint
Published: 2017
Subjects:
Online Access:https://arxiv.org/abs/1709.06995
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds can be useful in inferring properties of large networks using few samples.