Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.04519 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912571415592960 |
|---|---|
| author | Badash, Zvi Ben-Atya, Hadas Gavrielov, Naama Hazan, Liam Focht, Gili Cytter-Kuint, Ruth Hagopian, Talar Turner, Dan Freiman, Moti |
| author_facet | Badash, Zvi Ben-Atya, Hadas Gavrielov, Naama Hazan, Liam Focht, Gili Cytter-Kuint, Ruth Hagopian, Talar Turner, Dan Freiman, Moti |
| contents | Extracting structured clinical information from radiology reports is challenging, especially in low-resource languages. This is pronounced in Crohn's disease, with sparsely represented multi-organ findings. We developed Hierarchical Structured Matching Prediction BERT (HSMP-BERT), a prompt-based model for extraction from Hebrew radiology text. In an administrative database study, we analyzed 9,683 reports from Crohn's patients imaged 2010-2023 across Israeli providers. A subset of 512 reports was radiologist-annotated for findings across six gastrointestinal organs and 15 pathologies, yielding 90 structured labels per subject. Multilabel-stratified split (66% train+validation; 33% test), preserving label prevalence. Performance was evaluated with accuracy, F1, Cohen's $κ$, AUC, PPV, NPV, and recall. On 24 organ-finding combinations with $>$15 positives, HSMP-BERT achieved mean F1 0.83$\pm$0.08 and $κ$ 0.65$\pm$0.17, outperforming the SMP zero-shot baseline (F1 0.49$\pm$0.07, $κ$ 0.06$\pm$0.07) and standard fine-tuning (F1 0.30$\pm$0.27, $κ$ 0.27$\pm$0.34; paired t-test $p < 10^{-7}$). Hierarchical inference cuts runtime 5.1$\times$ vs. traditional inference. Applied to all reports, it revealed associations among ileal wall thickening, stenosis, and pre-stenotic dilatation, plus age- and sex-specific trends in inflammatory findings. HSMP-BERT offers a scalable solution for structured extraction in radiology, enabling population-level analysis of Crohn's disease and demonstrating AI's potential in low-resource settings. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2509_04519 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Hierarchical Section Matching Prediction (HSMP) BERT for Fine-Grained Extraction of Structured Data from Hebrew Free-Text Radiology Reports in Crohn's Disease Badash, Zvi Ben-Atya, Hadas Gavrielov, Naama Hazan, Liam Focht, Gili Cytter-Kuint, Ruth Hagopian, Talar Turner, Dan Freiman, Moti Computation and Language Extracting structured clinical information from radiology reports is challenging, especially in low-resource languages. This is pronounced in Crohn's disease, with sparsely represented multi-organ findings. We developed Hierarchical Structured Matching Prediction BERT (HSMP-BERT), a prompt-based model for extraction from Hebrew radiology text. In an administrative database study, we analyzed 9,683 reports from Crohn's patients imaged 2010-2023 across Israeli providers. A subset of 512 reports was radiologist-annotated for findings across six gastrointestinal organs and 15 pathologies, yielding 90 structured labels per subject. Multilabel-stratified split (66% train+validation; 33% test), preserving label prevalence. Performance was evaluated with accuracy, F1, Cohen's $κ$, AUC, PPV, NPV, and recall. On 24 organ-finding combinations with $>$15 positives, HSMP-BERT achieved mean F1 0.83$\pm$0.08 and $κ$ 0.65$\pm$0.17, outperforming the SMP zero-shot baseline (F1 0.49$\pm$0.07, $κ$ 0.06$\pm$0.07) and standard fine-tuning (F1 0.30$\pm$0.27, $κ$ 0.27$\pm$0.34; paired t-test $p < 10^{-7}$). Hierarchical inference cuts runtime 5.1$\times$ vs. traditional inference. Applied to all reports, it revealed associations among ileal wall thickening, stenosis, and pre-stenotic dilatation, plus age- and sex-specific trends in inflammatory findings. HSMP-BERT offers a scalable solution for structured extraction in radiology, enabling population-level analysis of Crohn's disease and demonstrating AI's potential in low-resource settings. |
| title | Hierarchical Section Matching Prediction (HSMP) BERT for Fine-Grained Extraction of Structured Data from Hebrew Free-Text Radiology Reports in Crohn's Disease |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2509.04519 |