Saved in:
Bibliographic Details
Main Authors: Hibshman, Justus Isaiah, Hoq, Adnan, Weninger, Tim
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.13489
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913677346603008
author Hibshman, Justus Isaiah
Hoq, Adnan
Weninger, Tim
author_facet Hibshman, Justus Isaiah
Hoq, Adnan
Weninger, Tim
contents Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what this metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple optimization strategy, we can uncover a wide variety of patterns. Finally, we evaluate several well-known graph mining algorithms with this metric; we find that although they produce patterns, those patterns are not always the best representation of the input data.
format Preprint
id arxiv_https___arxiv_org_abs_2404_13489
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SCHENO: Measuring Schema vs. Noise in Graphs
Hibshman, Justus Isaiah
Hoq, Adnan
Weninger, Tim
Databases
68R10, 68T10, 08A35
Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what this metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple optimization strategy, we can uncover a wide variety of patterns. Finally, we evaluate several well-known graph mining algorithms with this metric; we find that although they produce patterns, those patterns are not always the best representation of the input data.
title SCHENO: Measuring Schema vs. Noise in Graphs
topic Databases
68R10, 68T10, 08A35
url https://arxiv.org/abs/2404.13489