Saved in:
Bibliographic Details
Main Authors: Shen, Zhaiming, Wang, Menglun, Cheng, Guang, Lai, Ming-Jun, Mu, Lin, Huang, Ruihao, Liu, Qi, Zhu, Hao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.03060
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929336118935552
author Shen, Zhaiming
Wang, Menglun
Cheng, Guang
Lai, Ming-Jun
Mu, Lin
Huang, Ruihao
Liu, Qi
Zhu, Hao
author_facet Shen, Zhaiming
Wang, Menglun
Cheng, Guang
Lai, Ming-Jun
Mu, Lin
Huang, Ruihao
Liu, Qi
Zhu, Hao
contents Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples. The TOOD detection mechanism is based on computing pairwise hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model through in-distribution training samples. Our approach is interpretable and robust for its tree-based nature. Furthermore, our approach is efficient, flexible to various machine learning tasks, and can be easily generalized to unsupervised setting. Extensive experiments are conducted to show the proposed method outperforms other state-of-the-art out-of-distribution detection methods in distinguishing the in-distribution from out-of-distribution on various tabular, image, and text data.
format Preprint
id arxiv_https___arxiv_org_abs_2405_03060
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Tree-based Ensemble Learning for Out-of-distribution Detection
Shen, Zhaiming
Wang, Menglun
Cheng, Guang
Lai, Ming-Jun
Mu, Lin
Huang, Ruihao
Liu, Qi
Zhu, Hao
Machine Learning
Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples. The TOOD detection mechanism is based on computing pairwise hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model through in-distribution training samples. Our approach is interpretable and robust for its tree-based nature. Furthermore, our approach is efficient, flexible to various machine learning tasks, and can be easily generalized to unsupervised setting. Extensive experiments are conducted to show the proposed method outperforms other state-of-the-art out-of-distribution detection methods in distinguishing the in-distribution from out-of-distribution on various tabular, image, and text data.
title Tree-based Ensemble Learning for Out-of-distribution Detection
topic Machine Learning
url https://arxiv.org/abs/2405.03060