Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Chen, Qi, Ding, Shuhan, Gu, Yu, Liu, Nan, Bian, Jiang, Yuille, Alan, Zhou, Zongwei, Fu, Jingjing
Formato:	Preprint
Publicado:	2026
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2605.30893
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866913172240203776
author	Chen, Qi Ding, Shuhan Gu, Yu Liu, Nan Bian, Jiang Yuille, Alan Zhou, Zongwei Fu, Jingjing
author_facet	Chen, Qi Ding, Shuhan Gu, Yu Liu, Nan Bian, Jiang Yuille, Alan Zhou, Zongwei Fu, Jingjing
contents	Variational autoencoders (VAEs) compress high resolution CT volumes into compact latents while preserving clinically relevant structure. However, training CT-specific VAEs from scratch or heavily fine-tuning them incurs substantial computational and engineering cost, and often degrades under heterogeneous scanners, protocols, and diseases. This paper makes a progressive stride toward training-free medical VAEs by leveraging a critical observation: a single Foundation VAE, pretrained at scale on natural images and videos, can serve as a unified interface for CT Reconstruction, Augmentation, and Generation. With both encoder and decoder frozen, the Foundation VAE reconstructs CT volumes with preserved anatomy while suppressing acquisition noise; training segmentation models on these reconstructions improves surface accuracy by 3.9% NSD on average for pancreatic tumor and lung tumor. Within the same Foundation VAE latent space, a conditional latent diffusion model achieves 3.9% lower average FVD with 36.2% higher CT CLIP score, and improves multi-disease generation faithfulness across 18 types by 2.76% AUC. These results demonstrate Foundation VAEs as a practical interface for scalable CT representation reuse and faithful CT generation. Our code and demo are available at https://github.com/qic999/Foundation-VAE.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_30893
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Foundation VAEs for 3D CT Reconstruction, Augmentation, and Generation Chen, Qi Ding, Shuhan Gu, Yu Liu, Nan Bian, Jiang Yuille, Alan Zhou, Zongwei Fu, Jingjing Computer Vision and Pattern Recognition Variational autoencoders (VAEs) compress high resolution CT volumes into compact latents while preserving clinically relevant structure. However, training CT-specific VAEs from scratch or heavily fine-tuning them incurs substantial computational and engineering cost, and often degrades under heterogeneous scanners, protocols, and diseases. This paper makes a progressive stride toward training-free medical VAEs by leveraging a critical observation: a single Foundation VAE, pretrained at scale on natural images and videos, can serve as a unified interface for CT Reconstruction, Augmentation, and Generation. With both encoder and decoder frozen, the Foundation VAE reconstructs CT volumes with preserved anatomy while suppressing acquisition noise; training segmentation models on these reconstructions improves surface accuracy by 3.9% NSD on average for pancreatic tumor and lung tumor. Within the same Foundation VAE latent space, a conditional latent diffusion model achieves 3.9% lower average FVD with 36.2% higher CT CLIP score, and improves multi-disease generation faithfulness across 18 types by 2.76% AUC. These results demonstrate Foundation VAEs as a practical interface for scalable CT representation reuse and faithful CT generation. Our code and demo are available at https://github.com/qic999/Foundation-VAE.
title	Foundation VAEs for 3D CT Reconstruction, Augmentation, and Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2605.30893

Ejemplares similares