Saved in:
Bibliographic Details
Main Authors: Luu, Minh Sao Khue, Benedichuk, Margaret V., Roppert, Ekaterina I., Kenzhin, Roman M., Tuchinov, Bair N.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.20196
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911228042936320
author Luu, Minh Sao Khue
Benedichuk, Margaret V.
Roppert, Ekaterina I.
Kenzhin, Roman M.
Tuchinov, Bair N.
author_facet Luu, Minh Sao Khue
Benedichuk, Margaret V.
Roppert, Ekaterina I.
Kenzhin, Roman M.
Tuchinov, Bair N.
contents The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 15 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.
format Preprint
id arxiv_https___arxiv_org_abs_2510_20196
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development
Luu, Minh Sao Khue
Benedichuk, Margaret V.
Roppert, Ekaterina I.
Kenzhin, Roman M.
Tuchinov, Bair N.
Computer Vision and Pattern Recognition
The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 15 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.
title A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2510.20196