Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bressan, Marco, Cesa-Bianchi, Nicolò, Esposito, Emmanuel, Mansour, Yishay, Moran, Shay, Thiessen, Maximilian
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.10529
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909224635727872
author	Bressan, Marco Cesa-Bianchi, Nicolò Esposito, Emmanuel Mansour, Yishay Moran, Shay Thiessen, Maximilian
author_facet	Bressan, Marco Cesa-Bianchi, Nicolò Esposito, Emmanuel Mansour, Yishay Moran, Shay Thiessen, Maximilian
contents	Can a deep neural network be approximated by a small decision tree based on simple features? This question and its variants are behind the growing demand for machine learning models that are interpretable by humans. In this work we study such questions by introducing interpretable approximations, a notion that captures the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $\mathcal{H}$. In particular, we consider the approximation of a binary concept $c$ by decision trees based on a simple class $\mathcal{H}$ (e.g., of bounded VC dimension), and use the tree depth as a measure of complexity. Our primary contribution is the following remarkable trichotomy. For any given pair of $\mathcal{H}$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $\mathcal{H}$ with arbitrary accuracy; (ii) $c$ can be approximated by $\mathcal{H}$ with arbitrary accuracy, but there exists no universal rate that bounds the complexity of the approximations as a function of the accuracy; or (iii) there exists a constant $κ$ that depends only on $\mathcal{H}$ and $c$ such that, for any data distribution and any desired accuracy level, $c$ can be approximated by $\mathcal{H}$ with a complexity not exceeding $κ$. This taxonomy stands in stark contrast to the landscape of supervised classification, which offers a complex array of distribution-free and universally learnable scenarios. We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-free) complexity. We extend our trichotomy to classes $\mathcal{H}$ of unbounded VC dimension and give characterizations of interpretability based on the algebra generated by $\mathcal{H}$.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_10529
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A Theory of Interpretable Approximations Bressan, Marco Cesa-Bianchi, Nicolò Esposito, Emmanuel Mansour, Yishay Moran, Shay Thiessen, Maximilian Machine Learning Artificial Intelligence Can a deep neural network be approximated by a small decision tree based on simple features? This question and its variants are behind the growing demand for machine learning models that are interpretable by humans. In this work we study such questions by introducing interpretable approximations, a notion that captures the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $\mathcal{H}$. In particular, we consider the approximation of a binary concept $c$ by decision trees based on a simple class $\mathcal{H}$ (e.g., of bounded VC dimension), and use the tree depth as a measure of complexity. Our primary contribution is the following remarkable trichotomy. For any given pair of $\mathcal{H}$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $\mathcal{H}$ with arbitrary accuracy; (ii) $c$ can be approximated by $\mathcal{H}$ with arbitrary accuracy, but there exists no universal rate that bounds the complexity of the approximations as a function of the accuracy; or (iii) there exists a constant $κ$ that depends only on $\mathcal{H}$ and $c$ such that, for any data distribution and any desired accuracy level, $c$ can be approximated by $\mathcal{H}$ with a complexity not exceeding $κ$. This taxonomy stands in stark contrast to the landscape of supervised classification, which offers a complex array of distribution-free and universally learnable scenarios. We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-free) complexity. We extend our trichotomy to classes $\mathcal{H}$ of unbounded VC dimension and give characterizations of interpretability based on the algebra generated by $\mathcal{H}$.
title	A Theory of Interpretable Approximations
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2406.10529

Similar Items