Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Barone, Mariano, Di Serio, Francesco, Riccio, Giuseppe, Romano, Antonio, Postiglione, Marco, Ferraro, Antonino, Moscato, Vincenzo
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.22098
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911468570542080
author	Barone, Mariano Di Serio, Francesco Riccio, Giuseppe Romano, Antonio Postiglione, Marco Ferraro, Antonino Moscato, Vincenzo
author_facet	Barone, Mariano Di Serio, Francesco Riccio, Giuseppe Romano, Antonio Postiglione, Marco Ferraro, Antonino Moscato, Vincenzo
contents	Current medical vision-language models (VLMs) process volumetric brain MRI using 2D slice-based approximations, fragmenting the spatial context required for accurate neuroradiological interpretation. We developed \textbf{Brain3D}, a staged vision-language framework for automated radiology report generation from 3D brain tumor MRI. Our approach inflates a pretrained 2D medical encoder into a native 3D architecture and progressively aligns it with a causal language model through three stages: contrastive grounding, supervised projector warmup, and LoRA-based linguistic specialization. Unlike generalist 3D medical VLMs, \textbf{Brain3D} is tailored to neuroradiology, where hemispheric laterality, tumor infiltration patterns, and anatomical localization are critical. Evaluated on 468 subjects (BraTS pathological cases plus healthy controls), our model achieves a Clinical Pathology F1 of 0.951 versus 0.413 for a strong 2D baseline while maintaining perfect specificity on healthy scans. The staged alignment proves essential: contrastive grounding establishes visual-textual correspondence, projector warmup stabilizes conditioning, and LoRA adaptation shifts output from verbose captions to structured clinical reports\footnote{Our code is publicly available for transparency and reproducibility
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_22098
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D Barone, Mariano Di Serio, Francesco Riccio, Giuseppe Romano, Antonio Postiglione, Marco Ferraro, Antonino Moscato, Vincenzo Computer Vision and Pattern Recognition Current medical vision-language models (VLMs) process volumetric brain MRI using 2D slice-based approximations, fragmenting the spatial context required for accurate neuroradiological interpretation. We developed \textbf{Brain3D}, a staged vision-language framework for automated radiology report generation from 3D brain tumor MRI. Our approach inflates a pretrained 2D medical encoder into a native 3D architecture and progressively aligns it with a causal language model through three stages: contrastive grounding, supervised projector warmup, and LoRA-based linguistic specialization. Unlike generalist 3D medical VLMs, \textbf{Brain3D} is tailored to neuroradiology, where hemispheric laterality, tumor infiltration patterns, and anatomical localization are critical. Evaluated on 468 subjects (BraTS pathological cases plus healthy controls), our model achieves a Clinical Pathology F1 of 0.951 versus 0.413 for a strong 2D baseline while maintaining perfect specificity on healthy scans. The staged alignment proves essential: contrastive grounding establishes visual-textual correspondence, projector warmup stabilizes conditioning, and LoRA adaptation shifts output from verbose captions to structured clinical reports\footnote{Our code is publicly available for transparency and reproducibility
title	Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.22098

Similar Items