Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Agarwal, Adit, Shukla, K. K., Kuijper, Arjan, Mukhopadhyay, Anirban
Format:	Preprint
Published:	2020
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2012.03089
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909302605742080
author	Agarwal, Adit Shukla, K. K. Kuijper, Arjan Mukhopadhyay, Anirban
author_facet	Agarwal, Adit Shukla, K. K. Kuijper, Arjan Mukhopadhyay, Anirban
contents	The ability to interpret decisions taken by Machine Learning (ML) models is fundamental to encourage trust and reliability in different practical applications. Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex ML models. However, these strategies are restricted by the subjective biases of humans. To dissociate from such human biases, we propose an interpretation-by-distillation formulation that is defined relative to other ML models. We generalize the distillation technique for quantifying interpretability, using an information-theoretic perspective, removing the role of ground-truth from the definition of interpretability. Our work defines the entropy of supervised classification models, providing bounds on the entropy of Piece-Wise Linear Neural Networks (PWLNs), along with the first theoretical bounds on the interpretability of PWLNs. We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets and demonstrate the applicability of the proposed theoretical framework in different supervised classification scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2012_03089
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	Understanding Interpretability by generalized distillation in Supervised Classification Agarwal, Adit Shukla, K. K. Kuijper, Arjan Mukhopadhyay, Anirban Machine Learning The ability to interpret decisions taken by Machine Learning (ML) models is fundamental to encourage trust and reliability in different practical applications. Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex ML models. However, these strategies are restricted by the subjective biases of humans. To dissociate from such human biases, we propose an interpretation-by-distillation formulation that is defined relative to other ML models. We generalize the distillation technique for quantifying interpretability, using an information-theoretic perspective, removing the role of ground-truth from the definition of interpretability. Our work defines the entropy of supervised classification models, providing bounds on the entropy of Piece-Wise Linear Neural Networks (PWLNs), along with the first theoretical bounds on the interpretability of PWLNs. We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets and demonstrate the applicability of the proposed theoretical framework in different supervised classification scenarios.
title	Understanding Interpretability by generalized distillation in Supervised Classification
topic	Machine Learning
url	https://arxiv.org/abs/2012.03089

Similar Items