Saved in:
Bibliographic Details
Main Authors: Tadić, Tvrtko, Becker, Cassiano, Neville, Jennifer
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.10148
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909271500783616
author Tadić, Tvrtko
Becker, Cassiano
Neville, Jennifer
author_facet Tadić, Tvrtko
Becker, Cassiano
Neville, Jennifer
contents Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.
format Preprint
id arxiv_https___arxiv_org_abs_2404_10148
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Node Similarities under Random Projections: Limits and Pathological Cases
Tadić, Tvrtko
Becker, Cassiano
Neville, Jennifer
Social and Information Networks
Data Structures and Algorithms
Machine Learning
Probability
Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.
title Node Similarities under Random Projections: Limits and Pathological Cases
topic Social and Information Networks
Data Structures and Algorithms
Machine Learning
Probability
url https://arxiv.org/abs/2404.10148