Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Hsiu-Hsuan, Mai, Tan-Ha, Ye, Nai-Xuan, Lin, Wei-I, Lin, Hsuan-Tien
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2305.08295
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912428958154752
author	Wang, Hsiu-Hsuan Mai, Tan-Ha Ye, Nai-Xuan Lin, Wei-I Lin, Hsuan-Tien
author_facet	Wang, Hsiu-Hsuan Mai, Tan-Ha Ye, Nai-Xuan Lin, Wei-I Lin, Hsuan-Tien
contents	Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetically labeled datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets, namely CLImage, which are publicly available at: https://github.com/ntucllab/CLImage\_Dataset. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetically labeled datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2305_08295
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	CLImage: Human-Annotated Datasets for Complementary-Label Learning Wang, Hsiu-Hsuan Mai, Tan-Ha Ye, Nai-Xuan Lin, Wei-I Lin, Hsuan-Tien Machine Learning Computer Vision and Pattern Recognition Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetically labeled datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets, namely CLImage, which are publicly available at: https://github.com/ntucllab/CLImage\_Dataset. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetically labeled datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.
title	CLImage: Human-Annotated Datasets for Complementary-Label Learning
topic	Machine Learning Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2305.08295

Similar Items