Guardado en:
Detalles Bibliográficos
Autores principales: Jain, Karan, Teli, Mohammad Nayeem
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2504.10883
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917986116304896
author Jain, Karan
Teli, Mohammad Nayeem
author_facet Jain, Karan
Teli, Mohammad Nayeem
contents Diffusion models have recently gained state of the art performance on many image generation tasks. However, most models require significant computational resources to achieve this. This becomes apparent in the application of medical image synthesis due to the 3D nature of medical datasets like CT-scans, MRIs, electron microscope, etc. In this paper we propose a novel architecture for a single GPU memory-efficient training for diffusion models for high dimensional medical datasets. The proposed model is built by using an invertible UNet architecture with invertible attention modules. This leads to the following two contributions: 1. denoising diffusion models and thus enabling memory usage to be independent of the dimensionality of the dataset, and 2. reducing the energy usage during training. While this new model can be applied to a multitude of image generation tasks, we showcase its memory-efficiency on the 3D BraTS2020 dataset leading to up to 15\% decrease in peak memory consumption during training with comparable results to SOTA while maintaining the image quality.
format Preprint
id arxiv_https___arxiv_org_abs_2504_10883
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models
Jain, Karan
Teli, Mohammad Nayeem
Computer Vision and Pattern Recognition
Artificial Intelligence
Diffusion models have recently gained state of the art performance on many image generation tasks. However, most models require significant computational resources to achieve this. This becomes apparent in the application of medical image synthesis due to the 3D nature of medical datasets like CT-scans, MRIs, electron microscope, etc. In this paper we propose a novel architecture for a single GPU memory-efficient training for diffusion models for high dimensional medical datasets. The proposed model is built by using an invertible UNet architecture with invertible attention modules. This leads to the following two contributions: 1. denoising diffusion models and thus enabling memory usage to be independent of the dimensionality of the dataset, and 2. reducing the energy usage during training. While this new model can be applied to a multitude of image generation tasks, we showcase its memory-efficiency on the 3D BraTS2020 dataset leading to up to 15\% decrease in peak memory consumption during training with comparable results to SOTA while maintaining the image quality.
title Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2504.10883