Saved in:
Bibliographic Details
Main Authors: Heidari, Moein, Bozorgpour, Afshin, Zarif-Fakharnia, AmirHossein, Chen, Wenjin, Merhof, Dorit, Foran, David J, Grewal, Jasmine, Hacihaliloglu, Ilker
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.17543
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910053204754432
author Heidari, Moein
Bozorgpour, Afshin
Zarif-Fakharnia, AmirHossein
Chen, Wenjin
Merhof, Dorit
Foran, David J
Grewal, Jasmine
Hacihaliloglu, Ilker
author_facet Heidari, Moein
Bozorgpour, Afshin
Zarif-Fakharnia, AmirHossein
Chen, Wenjin
Merhof, Dorit
Foran, David J
Grewal, Jasmine
Hacihaliloglu, Ilker
contents Objective To develop a robust and computationally efficient deep learning model for automated left ventricular ejection fraction (LVEF) estimation from echocardiography videos that is suitable for real-time point-of-care ultrasound (POCUS) deployment. Methods We propose Echo-E$^3$Net, an endocardial spatio-temporal network that explicitly incorporates cardiac anatomy into LVEF prediction. The model comprises a dual-phase Endocardial Border Detector (E$^2$CBD) that uses phase-specific cross attention to localize end-diastolic and end-systolic endocardial landmarks and to learn phase-aware landmark embeddings, and an Endocardial Feature Aggregator (E$^2$FA) that fuses these embeddings with global statistical descriptors of deep feature maps to refine EF regression. Training is guided by a multi-component loss inspired by Simpson's biplane method that jointly supervises EF and landmark geometry. We evaluate Echo-E$^3$Net on the EchoNet-Dynamic dataset using RMSE and R$^2$ while reporting parameter count and GFLOPs to characterize efficiency. Results On EchoNet-Dynamic, Echo-E$^3$Net achieves an RMSE of 5.20 and an R$^2$ score of 0.82 while using only 1.55M parameters and 8.05 GFLOPs. The model operates without external pre-training, heavy data augmentation, or test-time ensembling, supporting practical real-time deployment. Conclusion By combining phase-aware endocardial landmark modeling with lightweight spatio-temporal feature aggregation, Echo-E$^3$Net improves the efficiency and robustness of automated LVEF estimation and is well-suited for scalable clinical use in POCUS settings. Code is available at https://github.com/moeinheidari7829/Echo-E3Net
format Preprint
id arxiv_https___arxiv_org_abs_2503_17543
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Echo-E$^3$Net: Efficient Endocardial Spatio-Temporal Network for Ejection Fraction Estimation
Heidari, Moein
Bozorgpour, Afshin
Zarif-Fakharnia, AmirHossein
Chen, Wenjin
Merhof, Dorit
Foran, David J
Grewal, Jasmine
Hacihaliloglu, Ilker
Image and Video Processing
Computer Vision and Pattern Recognition
Objective To develop a robust and computationally efficient deep learning model for automated left ventricular ejection fraction (LVEF) estimation from echocardiography videos that is suitable for real-time point-of-care ultrasound (POCUS) deployment. Methods We propose Echo-E$^3$Net, an endocardial spatio-temporal network that explicitly incorporates cardiac anatomy into LVEF prediction. The model comprises a dual-phase Endocardial Border Detector (E$^2$CBD) that uses phase-specific cross attention to localize end-diastolic and end-systolic endocardial landmarks and to learn phase-aware landmark embeddings, and an Endocardial Feature Aggregator (E$^2$FA) that fuses these embeddings with global statistical descriptors of deep feature maps to refine EF regression. Training is guided by a multi-component loss inspired by Simpson's biplane method that jointly supervises EF and landmark geometry. We evaluate Echo-E$^3$Net on the EchoNet-Dynamic dataset using RMSE and R$^2$ while reporting parameter count and GFLOPs to characterize efficiency. Results On EchoNet-Dynamic, Echo-E$^3$Net achieves an RMSE of 5.20 and an R$^2$ score of 0.82 while using only 1.55M parameters and 8.05 GFLOPs. The model operates without external pre-training, heavy data augmentation, or test-time ensembling, supporting practical real-time deployment. Conclusion By combining phase-aware endocardial landmark modeling with lightweight spatio-temporal feature aggregation, Echo-E$^3$Net improves the efficiency and robustness of automated LVEF estimation and is well-suited for scalable clinical use in POCUS settings. Code is available at https://github.com/moeinheidari7829/Echo-E3Net
title Echo-E$^3$Net: Efficient Endocardial Spatio-Temporal Network for Ejection Fraction Estimation
topic Image and Video Processing
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2503.17543