Saved in:
Bibliographic Details
Main Author: Pepin, Bob
Format: Preprint
Published: 2020
Subjects:
Online Access:https://arxiv.org/abs/2011.04072
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917866501046272
author Pepin, Bob
author_facet Pepin, Bob
contents This short note introduces the harmonic indel distance (HID), a new distance between strings where the cost of an insertion or deletion is inversely proportional to the string length. We present a closed-form formula and show that the HID is a proper distance metric. Then we perform an experimental comparison of HID to normalized and unnormalized versions of the indel distance on benchmark tasks for biomedical sequence data. We finally show planar embeddings of the benchmark datasets to provide some insights into the geometry of the metric spaces associated with the different distance metrics.
format Preprint
id arxiv_https___arxiv_org_abs_2011_04072
institution arXiv
publishDate 2020
record_format arxiv
spellingShingle The Harmonic Indel Distance
Pepin, Bob
Discrete Mathematics
This short note introduces the harmonic indel distance (HID), a new distance between strings where the cost of an insertion or deletion is inversely proportional to the string length. We present a closed-form formula and show that the HID is a proper distance metric. Then we perform an experimental comparison of HID to normalized and unnormalized versions of the indel distance on benchmark tasks for biomedical sequence data. We finally show planar embeddings of the benchmark datasets to provide some insights into the geometry of the metric spaces associated with the different distance metrics.
title The Harmonic Indel Distance
topic Discrete Mathematics
url https://arxiv.org/abs/2011.04072