Saved in:
Bibliographic Details
Main Authors: Sarasa, Guillermo, Granados, Ana, Rodríguez, Francisco de Borja
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.14780
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Compression-based dissimilarities (CD) offer a flexible and domain-agnostic means of measuring similarity by identifying implicit information through redundancies between data objects. However, as similarity features are derived from the data, rather than defined as an input, it often proves difficult to align with the task at hand, particularly in complex clustering or classification settings. To address this issue, we introduce "context steering", a novel methodology that actively guides the feature-shaping process. Instead of passively accepting the emergent data structure (typically a hierarchy derived from clustering CDs), our approach "steers" the process by systematically analyzing how each object influences the relational context within a clustering framework. This process generates a custom-tailored embedding that isolates and amplifies class-distinctive information. We validate this supervised context-steering strategy using Normalized Compression Distance (NCD) and Relative Compression Distance (NRC) combined with hierarchical clustering, and evaluate the learned embeddings through both classification performance and cluster-quality metrics. Experiments on heterogeneous datasets-from text to real-world audio-show that the proposed approach yields robust task-oriented embeddings from compression dissimilarities, moving from traditional transductive uses of distance matrices to an inductive representation that can be applied to unseen data.