Saved in:
Bibliographic Details
Main Authors: Grutman, Tal, Shinar, Carmel, Ilovitsh, Tali
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.01444
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914300374810624
author Grutman, Tal
Shinar, Carmel
Ilovitsh, Tali
author_facet Grutman, Tal
Shinar, Carmel
Ilovitsh, Tali
contents Ultrasound is the most widely used medical imaging modality, yet the images it produces are fundamentally unique, arising from tissue-dependent scattering, reflection, and speed-of-sound variations that produce a constrained set of characteristic textures that differ markedly from natural-image statistics. These acoustically driven patterns make ultrasound challenging for algorithms originally designed for natural images. To bridge this gap, the field has increasingly turned to foundation models, hoping to leverage their generalization capabilities. However, these models often falter in ultrasound applications because they are not designed for ultrasound physics, they are merely trained on ultrasound data. Therefore, it is essential to integrate ultrasound-specific domain knowledge into established learning frameworks. We achieve this by reformulating self-supervised learning as a texture-analysis problem, introducing texture ultrasound semantic analysis (TUSA). Using TUSA, models learn to leverage highly scalable contrastive methods to extract true domain-specific representations directly from simple B-mode images. We train a TUSA model on a combination of open-source, simulated, and in vivo data. The latent space is compared to several larger foundation models, demonstrating that our approach gives TUSA models better generalizability for difficult downstream tasks on unique online datasets as well as a clinical eye dataset collected for this study. Our model achieves higher accuracy in detecting COVID (70%), spinal hematoma (100%) and vitreous hemorrhage (97%) and correlates more closely with quantitative parameters like liver steatosis (r = 0.83), ejection fraction (r = 0.63), and oxygen saturation (r = 0.38). We open-source the model weights and training script: https://github.com/talg2324/tusa
format Preprint
id arxiv_https___arxiv_org_abs_2602_01444
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A texture-based framework for foundational ultrasound models
Grutman, Tal
Shinar, Carmel
Ilovitsh, Tali
Image and Video Processing
Computer Vision and Pattern Recognition
Ultrasound is the most widely used medical imaging modality, yet the images it produces are fundamentally unique, arising from tissue-dependent scattering, reflection, and speed-of-sound variations that produce a constrained set of characteristic textures that differ markedly from natural-image statistics. These acoustically driven patterns make ultrasound challenging for algorithms originally designed for natural images. To bridge this gap, the field has increasingly turned to foundation models, hoping to leverage their generalization capabilities. However, these models often falter in ultrasound applications because they are not designed for ultrasound physics, they are merely trained on ultrasound data. Therefore, it is essential to integrate ultrasound-specific domain knowledge into established learning frameworks. We achieve this by reformulating self-supervised learning as a texture-analysis problem, introducing texture ultrasound semantic analysis (TUSA). Using TUSA, models learn to leverage highly scalable contrastive methods to extract true domain-specific representations directly from simple B-mode images. We train a TUSA model on a combination of open-source, simulated, and in vivo data. The latent space is compared to several larger foundation models, demonstrating that our approach gives TUSA models better generalizability for difficult downstream tasks on unique online datasets as well as a clinical eye dataset collected for this study. Our model achieves higher accuracy in detecting COVID (70%), spinal hematoma (100%) and vitreous hemorrhage (97%) and correlates more closely with quantitative parameters like liver steatosis (r = 0.83), ejection fraction (r = 0.63), and oxygen saturation (r = 0.38). We open-source the model weights and training script: https://github.com/talg2324/tusa
title A texture-based framework for foundational ultrasound models
topic Image and Video Processing
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.01444