Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.07754 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913647837577216 |
|---|---|
| author | Naeini, Mohammadreza Tavasoli Bereyhi, Ali Noshad, Morteza Liang, Ben Hero III, Alfred O. |
| author_facet | Naeini, Mohammadreza Tavasoli Bereyhi, Ali Noshad, Morteza Liang, Ben Hero III, Alfred O. |
| contents | This work invokes the notion of $f$-divergence to introduce a novel upper bound on the Bayes error rate of a general classification task. We show that the proposed bound can be computed by sampling from the output of a parameterized model. Using this practical interpretation, we introduce the Bayes optimal learning threshold (BOLT) loss whose minimization enforces a classification model to achieve the Bayes error rate. We validate the proposed loss for image and text classification tasks, considering MNIST, Fashion-MNIST, CIFAR-10, and IMDb datasets. Numerical experiments demonstrate that models trained with BOLT achieve performance on par with or exceeding that of cross-entropy, particularly on challenging datasets. This highlights the potential of BOLT in improving generalization. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2501_07754 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Universal Training of Neural Networks to Achieve Bayes Optimal Classification Accuracy Naeini, Mohammadreza Tavasoli Bereyhi, Ali Noshad, Morteza Liang, Ben Hero III, Alfred O. Machine Learning Computer Vision and Pattern Recognition Information Theory Image and Video Processing Signal Processing I.2.6; I.5.4 This work invokes the notion of $f$-divergence to introduce a novel upper bound on the Bayes error rate of a general classification task. We show that the proposed bound can be computed by sampling from the output of a parameterized model. Using this practical interpretation, we introduce the Bayes optimal learning threshold (BOLT) loss whose minimization enforces a classification model to achieve the Bayes error rate. We validate the proposed loss for image and text classification tasks, considering MNIST, Fashion-MNIST, CIFAR-10, and IMDb datasets. Numerical experiments demonstrate that models trained with BOLT achieve performance on par with or exceeding that of cross-entropy, particularly on challenging datasets. This highlights the potential of BOLT in improving generalization. |
| title | Universal Training of Neural Networks to Achieve Bayes Optimal Classification Accuracy |
| topic | Machine Learning Computer Vision and Pattern Recognition Information Theory Image and Video Processing Signal Processing I.2.6; I.5.4 |
| url | https://arxiv.org/abs/2501.07754 |