Saved in:
Bibliographic Details
Main Authors: Guo, Xuhui, Dam, Tanmoy, Dhamdhere, Rohan, Modanwal, Gourav, Madabhushi, Anant
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.07017
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916794678116352
author Guo, Xuhui
Dam, Tanmoy
Dhamdhere, Rohan
Modanwal, Gourav
Madabhushi, Anant
author_facet Guo, Xuhui
Dam, Tanmoy
Dhamdhere, Rohan
Modanwal, Gourav
Madabhushi, Anant
contents 3D medical image segmentation has progressed considerably due to Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), yet these methods struggle to balance long-range dependency acquisition with computational efficiency. To address this challenge, we propose UNETVL (U-Net Vision-LSTM), a novel architecture that leverages recent advancements in temporal information processing. UNETVL incorporates Vision-LSTM (ViL) for improved scalability and memory functions, alongside an efficient Chebyshev Kolmogorov-Arnold Networks (KAN) to handle complex and long-range dependency patterns more effectively. We validated our method on the ACDC and AMOS2022 (post challenge Task 2) benchmark datasets, showing a significant improvement in mean Dice score compared to recent state-of-the-art approaches, especially over its predecessor, UNETR, with increases of 7.3% on ACDC and 15.6% on AMOS, respectively. Extensive ablation studies were conducted to demonstrate the impact of each component in UNETVL, providing a comprehensive understanding of its architecture. Our code is available at https://github.com/tgrex6/UNETVL, facilitating further research and applications in this domain.
format Preprint
id arxiv_https___arxiv_org_abs_2501_07017
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle UNetVL: Enhancing 3D Medical Image Segmentation with Chebyshev KAN Powered Vision-LSTM
Guo, Xuhui
Dam, Tanmoy
Dhamdhere, Rohan
Modanwal, Gourav
Madabhushi, Anant
Computer Vision and Pattern Recognition
Artificial Intelligence
3D medical image segmentation has progressed considerably due to Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), yet these methods struggle to balance long-range dependency acquisition with computational efficiency. To address this challenge, we propose UNETVL (U-Net Vision-LSTM), a novel architecture that leverages recent advancements in temporal information processing. UNETVL incorporates Vision-LSTM (ViL) for improved scalability and memory functions, alongside an efficient Chebyshev Kolmogorov-Arnold Networks (KAN) to handle complex and long-range dependency patterns more effectively. We validated our method on the ACDC and AMOS2022 (post challenge Task 2) benchmark datasets, showing a significant improvement in mean Dice score compared to recent state-of-the-art approaches, especially over its predecessor, UNETR, with increases of 7.3% on ACDC and 15.6% on AMOS, respectively. Extensive ablation studies were conducted to demonstrate the impact of each component in UNETVL, providing a comprehensive understanding of its architecture. Our code is available at https://github.com/tgrex6/UNETVL, facilitating further research and applications in this domain.
title UNetVL: Enhancing 3D Medical Image Segmentation with Chebyshev KAN Powered Vision-LSTM
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2501.07017