Saved in:
Bibliographic Details
Main Author: Falcao, Andre O.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.19702
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916499578421248
author Falcao, Andre O.
author_facet Falcao, Andre O.
contents Mutual Information (MI) is a powerful statistical measure that quantifies shared information between random variables, particularly valuable in high-dimensional data analysis across fields like genomics, natural language processing, and network science. However, computing MI becomes computationally prohibitive for large datasets where it is typically required a pairwise computational approach where each column is compared to others. This work introduces a matrix-based algorithm that accelerates MI computation by leveraging vectorized operations and optimized matrix calculations. By transforming traditional pairwise computational approaches into bulk matrix operations, the proposed method enables efficient MI calculation across all variable pairs. Experimental results demonstrate significant performance improvements, with computation times reduced up to 50,000 times in the largest dataset using optimized implementations, particularly when utilizing hardware optimized frameworks. The approach promises to expand MI's applicability in data-driven research by overcoming previous computational limitations.
format Preprint
id arxiv_https___arxiv_org_abs_2411_19702
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Fast Mutual Information Computation for Large Binary Datasets
Falcao, Andre O.
Machine Learning
Information Theory
Numerical Analysis
Mutual Information (MI) is a powerful statistical measure that quantifies shared information between random variables, particularly valuable in high-dimensional data analysis across fields like genomics, natural language processing, and network science. However, computing MI becomes computationally prohibitive for large datasets where it is typically required a pairwise computational approach where each column is compared to others. This work introduces a matrix-based algorithm that accelerates MI computation by leveraging vectorized operations and optimized matrix calculations. By transforming traditional pairwise computational approaches into bulk matrix operations, the proposed method enables efficient MI calculation across all variable pairs. Experimental results demonstrate significant performance improvements, with computation times reduced up to 50,000 times in the largest dataset using optimized implementations, particularly when utilizing hardware optimized frameworks. The approach promises to expand MI's applicability in data-driven research by overcoming previous computational limitations.
title Fast Mutual Information Computation for Large Binary Datasets
topic Machine Learning
Information Theory
Numerical Analysis
url https://arxiv.org/abs/2411.19702