Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yin, Haojie, Feng, Chengcheng, Liu, Tianyi, Zhang, Tianqi, Huang, Kaizhu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.26513
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916046910259200
author	Yin, Haojie Feng, Chengcheng Liu, Tianyi Zhang, Tianqi Huang, Kaizhu
author_facet	Yin, Haojie Feng, Chengcheng Liu, Tianyi Zhang, Tianqi Huang, Kaizhu
contents	Mean Deviation (MD) is a critical metric for assessing visual field loss in ophthalmology. While previous work has focused solely on predicting MD from Optical Coherence Tomography (OCT), it is intuitive to assume that combining OCT with another imaging of fundus photography (FP) could improve performance, as two ophthalmic medical imaging provide complementary information. This is particularly expected when sophisticated multi-objective optimization is applied, as documented in common multimodal classification. Surprisingly, our investigations reveal that multimodal fusion in this medical imaging scenario performs worse than unimodal model. Through detailed analysis, we identify the root cause as a coupled imbalance between data distribution and modality learning conflict. This imbalance distorts the optimization landscape, leading to unstable training. To address this challenge, we propose the method of Rebalanced MultiModal Mean Deviation Regression (Re-M3Dr), a novel multimodal regression framework. We enhance unimodal representation through adaptive margin based supervised contrastive learning. Then, our framework stabilizes the joint optimization with the sharpness-aware gradient modulation. Experimental results on both public and private clinical datasets show average 29\% reduction in MSE compared to SOTA multimodal learning methods, demonstrating the superiority of Re-M3Dr. The code is available in the supplementary materials.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_26513
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression Yin, Haojie Feng, Chengcheng Liu, Tianyi Zhang, Tianqi Huang, Kaizhu Computer Vision and Pattern Recognition Mean Deviation (MD) is a critical metric for assessing visual field loss in ophthalmology. While previous work has focused solely on predicting MD from Optical Coherence Tomography (OCT), it is intuitive to assume that combining OCT with another imaging of fundus photography (FP) could improve performance, as two ophthalmic medical imaging provide complementary information. This is particularly expected when sophisticated multi-objective optimization is applied, as documented in common multimodal classification. Surprisingly, our investigations reveal that multimodal fusion in this medical imaging scenario performs worse than unimodal model. Through detailed analysis, we identify the root cause as a coupled imbalance between data distribution and modality learning conflict. This imbalance distorts the optimization landscape, leading to unstable training. To address this challenge, we propose the method of Rebalanced MultiModal Mean Deviation Regression (Re-M3Dr), a novel multimodal regression framework. We enhance unimodal representation through adaptive margin based supervised contrastive learning. Then, our framework stabilizes the joint optimization with the sharpness-aware gradient modulation. Experimental results on both public and private clinical datasets show average 29\% reduction in MSE compared to SOTA multimodal learning methods, demonstrating the superiority of Re-M3Dr. The code is available in the supplementary materials.
title	Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2605.26513

Similar Items