Saved in:
Bibliographic Details
Main Authors: Jiménez, Rubén, Pujol, Oriol
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.20773
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917228390121472
author Jiménez, Rubén
Pujol, Oriol
author_facet Jiménez, Rubén
Pujol, Oriol
contents Deployed machine learning systems must continuously evolve as data, architectures, and regulations change, often without access to original training data or model internals. In such settings, black-box copying provides a practical refactoring mechanism, i.e. upgrading legacy models by learning replicas from input-output queries alone. When restricted to hard-label outputs, copying turns into a discontinuous surface reconstruction problem from pointwise queries, severely limiting the ability to recover boundary geometry efficiently. We propose a distance-based copying (distillation) framework that replaces hard-label supervision with signed distances to the teacher's decision boundary, converting copying into a smooth regression problem that exploits local geometry. We develop an $α$-governed smoothing and regularization scheme with Hölder/Lipschitz control over the induced target surface, and introduce two model-agnostic algorithms to estimate signed distances under label-only access. Experiments on synthetic problems and UCI benchmarks show consistent improvements in fidelity and generalization accuracy over hard-label baselines, while enabling distance outputs as uncertainty-related signals for black-box replicas.
format Preprint
id arxiv_https___arxiv_org_abs_2601_20773
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Smoothing the Black-Box: Signed-Distance Supervision for Black-Box Model Copying
Jiménez, Rubén
Pujol, Oriol
Machine Learning
Deployed machine learning systems must continuously evolve as data, architectures, and regulations change, often without access to original training data or model internals. In such settings, black-box copying provides a practical refactoring mechanism, i.e. upgrading legacy models by learning replicas from input-output queries alone. When restricted to hard-label outputs, copying turns into a discontinuous surface reconstruction problem from pointwise queries, severely limiting the ability to recover boundary geometry efficiently. We propose a distance-based copying (distillation) framework that replaces hard-label supervision with signed distances to the teacher's decision boundary, converting copying into a smooth regression problem that exploits local geometry. We develop an $α$-governed smoothing and regularization scheme with Hölder/Lipschitz control over the induced target surface, and introduce two model-agnostic algorithms to estimate signed distances under label-only access. Experiments on synthetic problems and UCI benchmarks show consistent improvements in fidelity and generalization accuracy over hard-label baselines, while enabling distance outputs as uncertainty-related signals for black-box replicas.
title Smoothing the Black-Box: Signed-Distance Supervision for Black-Box Model Copying
topic Machine Learning
url https://arxiv.org/abs/2601.20773