Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Tong, Shen, Shu, Chen, C. L. Philip
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.19674
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912733257007104
author	Zhang, Tong Shen, Shu Chen, C. L. Philip
author_facet	Zhang, Tong Shen, Shu Chen, C. L. Philip
contents	Multimodal learning enhances the performance of various machine learning tasks by leveraging complementary information across different modalities. However, existing methods often learn multimodal representations that retain substantial inter-class confusion, making it difficult to achieve high-confidence predictions, particularly in real-world scenarios with low-quality or noisy data. To address this challenge, we propose Multi-Level Adaptive DeConfusion (MLAD), which eliminates inter-class confusion in multimodal data at both global and sample levels, significantly enhancing the classification reliability of multimodal models. Specifically, MLAD first learns class-wise latent distributions with global-level confusion removed via dynamic-exit modality encoders that adapt to the varying discrimination difficulty of each class and a cross-class residual reconstruction mechanism. Subsequently, MLAD further removes sample-specific confusion through sample-adaptive cross-modality rectification guided by confusion-free modality priors. These priors are constructed from low-confusion modality features, identified by evaluating feature confusion using the learned class-wise latent distributions and selecting those with low confusion via a Gaussian mixture model. Experiments demonstrate that MLAD outperforms state-of-the-art methods across multiple benchmarks and exhibits superior reliability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_19674
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion Zhang, Tong Shen, Shu Chen, C. L. Philip Computer Vision and Pattern Recognition Multimodal learning enhances the performance of various machine learning tasks by leveraging complementary information across different modalities. However, existing methods often learn multimodal representations that retain substantial inter-class confusion, making it difficult to achieve high-confidence predictions, particularly in real-world scenarios with low-quality or noisy data. To address this challenge, we propose Multi-Level Adaptive DeConfusion (MLAD), which eliminates inter-class confusion in multimodal data at both global and sample levels, significantly enhancing the classification reliability of multimodal models. Specifically, MLAD first learns class-wise latent distributions with global-level confusion removed via dynamic-exit modality encoders that adapt to the varying discrimination difficulty of each class and a cross-class residual reconstruction mechanism. Subsequently, MLAD further removes sample-specific confusion through sample-adaptive cross-modality rectification guided by confusion-free modality priors. These priors are constructed from low-confusion modality features, identified by evaluating feature confusion using the learned class-wise latent distributions and selecting those with low confusion via a Gaussian mixture model. Experiments demonstrate that MLAD outperforms state-of-the-art methods across multiple benchmarks and exhibits superior reliability.
title	Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2502.19674

Similar Items