Saved in:
Bibliographic Details
Main Authors: Levy, Devon, Assayag, Bar, Gaspar, Laura, Shimshoni, Ilan, Specktor-Fadida, Bella
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.18532
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911410573803520
author Levy, Devon
Assayag, Bar
Gaspar, Laura
Shimshoni, Ilan
Specktor-Fadida, Bella
author_facet Levy, Devon
Assayag, Bar
Gaspar, Laura
Shimshoni, Ilan
Specktor-Fadida, Bella
contents Accurate segmentation annotations are critical for disease monitoring, yet manual labeling remains a major bottleneck due to the time and expertise required. Active learning (AL) alleviates this burden by prioritizing informative samples for annotation, typically through a diversity-based cold-start phase followed by uncertainty-driven selection. We propose a novel cold-start sampling strategy that combines foundation-model embeddings with clustering, including automatic selection of the number of clusters and proportional sampling across clusters, to construct a diverse and representative initial training. This is followed by an uncertainty-based AL framework that integrates spatial diversity to guide sample selection. The proposed method is intuitive and interpretable, enabling visualization of the feature-space distribution of candidate samples. We evaluate our approach on three datasets spanning X-ray and MRI modalities. On the CheXmask dataset, the cold-start strategy outperforms random selection, improving Dice from 0.918 to 0.929 and reducing the Hausdorff distance from 32.41 to 27.66 mm. In the AL setting, combined entropy and diversity selection improves Dice from 0.919 to 0.939 and reduces the Hausdorff distance from 30.10 to 19.16 mm. On the Montgomery dataset, cold-start gains are substantial, with Dice improving from 0.928 to 0.950 and Hausdorff distance decreasing from 14.22 to 9.38 mm. On the SynthStrip dataset, cold-start selection slightly affects Dice but reduces the Hausdorff distance from 9.43 to 8.69 mm, while active learning improves Dice from 0.816 to 0.826 and reduces the Hausdorff distance from 7.76 to 6.38 mm. Overall, the proposed framework consistently outperforms baseline methods in low-data regimes, improving segmentation accuracy.
format Preprint
id arxiv_https___arxiv_org_abs_2601_18532
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle From Cold Start to Active Learning: Embedding-Based Scan Selection for Medical Image Segmentation
Levy, Devon
Assayag, Bar
Gaspar, Laura
Shimshoni, Ilan
Specktor-Fadida, Bella
Computer Vision and Pattern Recognition
Machine Learning
I.4.6; J.3
Accurate segmentation annotations are critical for disease monitoring, yet manual labeling remains a major bottleneck due to the time and expertise required. Active learning (AL) alleviates this burden by prioritizing informative samples for annotation, typically through a diversity-based cold-start phase followed by uncertainty-driven selection. We propose a novel cold-start sampling strategy that combines foundation-model embeddings with clustering, including automatic selection of the number of clusters and proportional sampling across clusters, to construct a diverse and representative initial training. This is followed by an uncertainty-based AL framework that integrates spatial diversity to guide sample selection. The proposed method is intuitive and interpretable, enabling visualization of the feature-space distribution of candidate samples. We evaluate our approach on three datasets spanning X-ray and MRI modalities. On the CheXmask dataset, the cold-start strategy outperforms random selection, improving Dice from 0.918 to 0.929 and reducing the Hausdorff distance from 32.41 to 27.66 mm. In the AL setting, combined entropy and diversity selection improves Dice from 0.919 to 0.939 and reduces the Hausdorff distance from 30.10 to 19.16 mm. On the Montgomery dataset, cold-start gains are substantial, with Dice improving from 0.928 to 0.950 and Hausdorff distance decreasing from 14.22 to 9.38 mm. On the SynthStrip dataset, cold-start selection slightly affects Dice but reduces the Hausdorff distance from 9.43 to 8.69 mm, while active learning improves Dice from 0.816 to 0.826 and reduces the Hausdorff distance from 7.76 to 6.38 mm. Overall, the proposed framework consistently outperforms baseline methods in low-data regimes, improving segmentation accuracy.
title From Cold Start to Active Learning: Embedding-Based Scan Selection for Medical Image Segmentation
topic Computer Vision and Pattern Recognition
Machine Learning
I.4.6; J.3
url https://arxiv.org/abs/2601.18532