Saved in:
Bibliographic Details
Main Authors: Bala, Gouranga, Gupta, Anuj, Behera, Subrat Kumar, Sethi, Amit
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.04898
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910730795614208
author Bala, Gouranga
Gupta, Anuj
Behera, Subrat Kumar
Sethi, Amit
author_facet Bala, Gouranga
Gupta, Anuj
Behera, Subrat Kumar
Sethi, Amit
contents Deep learning models rely heavily on large volumes of labeled data to achieve high performance. However, real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process. Instance-dependent label noise (IDN), where the probability of a label being corrupted depends on the input features, poses a significant challenge because it is more prevalent and harder to address than instance-independent noise. In this paper, we propose a novel hybrid framework that combines self-supervised learning using SimCLR with iterative pseudo-label refinement to mitigate the effects of IDN. The self-supervised pre-training phase enables the model to learn robust feature representations without relying on potentially noisy labels, establishing a noise-agnostic foundation. Subsequently, we employ an iterative training process with pseudo-label refinement, where confidently predicted samples are identified through a multistage approach and their labels are updated to improve label quality progressively. We evaluate our method on the CIFAR-10 and CIFAR-100 datasets augmented with synthetic instance-dependent noise at varying noise levels. Experimental results demonstrate that our approach significantly outperforms several state-of-the-art methods, particularly under high noise conditions, achieving notable improvements in classification accuracy and robustness. Our findings suggest that integrating self-supervised learning with iterative pseudo-label refinement offers an effective strategy for training deep neural networks on noisy datasets afflicted by instance-dependent label noise.
format Preprint
id arxiv_https___arxiv_org_abs_2412_04898
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement
Bala, Gouranga
Gupta, Anuj
Behera, Subrat Kumar
Sethi, Amit
Computer Vision and Pattern Recognition
Machine Learning
Deep learning models rely heavily on large volumes of labeled data to achieve high performance. However, real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process. Instance-dependent label noise (IDN), where the probability of a label being corrupted depends on the input features, poses a significant challenge because it is more prevalent and harder to address than instance-independent noise. In this paper, we propose a novel hybrid framework that combines self-supervised learning using SimCLR with iterative pseudo-label refinement to mitigate the effects of IDN. The self-supervised pre-training phase enables the model to learn robust feature representations without relying on potentially noisy labels, establishing a noise-agnostic foundation. Subsequently, we employ an iterative training process with pseudo-label refinement, where confidently predicted samples are identified through a multistage approach and their labels are updated to improve label quality progressively. We evaluate our method on the CIFAR-10 and CIFAR-100 datasets augmented with synthetic instance-dependent noise at varying noise levels. Experimental results demonstrate that our approach significantly outperforms several state-of-the-art methods, particularly under high noise conditions, achieving notable improvements in classification accuracy and robustness. Our findings suggest that integrating self-supervised learning with iterative pseudo-label refinement offers an effective strategy for training deep neural networks on noisy datasets afflicted by instance-dependent label noise.
title Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2412.04898