Saved in:
Bibliographic Details
Main Authors: Sun, Shoukun, Wang, Zhe, Que, Xiang, Zhang, Jiyin, Ma, Xiaogang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.19736
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910045754621952
author Sun, Shoukun
Wang, Zhe
Que, Xiang
Zhang, Jiyin
Ma, Xiaogang
author_facet Sun, Shoukun
Wang, Zhe
Que, Xiang
Zhang, Jiyin
Ma, Xiaogang
contents While diffusion models have achieved state-of-the-art performance in Image Super-Resolution (SR), their prohibitive computational and memory demands restrict their training and inference to fixed-size inputs. The standard workaround to super-resolve larger images relies on partitioning the image, super-resolving patches independently, and stitching them together -- a process that inevitably introduces severe boundary artifacts and spatial inconsistencies in large-scale scenes. To achieve spatially continuous, arbitrary-size image super-resolution, we propose InfScene-SR, a diffusion-based SR approach. Building upon SR3, our approach leverages Variance-Corrected Fusion (VCF) to perform joint-denoising across overlapping patches. VCF guarantees continuous transitions while preserving the stochastic variance crucial for high-fidelity texture reconstruction. To overcome the prohibitive synchronization overhead of scaling joint-denoising to gigapixel imagery, we introduce Spatially-Decoupled Variance Correction (SDVC). SDVC reformulates the global fusion process into independent, atomic patch operations, drastically reducing memory complexity to $\mathcal{O}(1)$ and naturally enabling fully distributed, parallelized inference. Extensive experiments on large-scale remote sensing datasets demonstrate that InfScene-SR strictly eliminates boundary seams, achieves superior perceptual quality, and significantly boosts performance in downstream semantic segmentation task.
format Preprint
id arxiv_https___arxiv_org_abs_2602_19736
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle InfScene-SR: Arbitrary-Size Image Super-Resolution via Iterative Joint-Denoising
Sun, Shoukun
Wang, Zhe
Que, Xiang
Zhang, Jiyin
Ma, Xiaogang
Computer Vision and Pattern Recognition
While diffusion models have achieved state-of-the-art performance in Image Super-Resolution (SR), their prohibitive computational and memory demands restrict their training and inference to fixed-size inputs. The standard workaround to super-resolve larger images relies on partitioning the image, super-resolving patches independently, and stitching them together -- a process that inevitably introduces severe boundary artifacts and spatial inconsistencies in large-scale scenes. To achieve spatially continuous, arbitrary-size image super-resolution, we propose InfScene-SR, a diffusion-based SR approach. Building upon SR3, our approach leverages Variance-Corrected Fusion (VCF) to perform joint-denoising across overlapping patches. VCF guarantees continuous transitions while preserving the stochastic variance crucial for high-fidelity texture reconstruction. To overcome the prohibitive synchronization overhead of scaling joint-denoising to gigapixel imagery, we introduce Spatially-Decoupled Variance Correction (SDVC). SDVC reformulates the global fusion process into independent, atomic patch operations, drastically reducing memory complexity to $\mathcal{O}(1)$ and naturally enabling fully distributed, parallelized inference. Extensive experiments on large-scale remote sensing datasets demonstrate that InfScene-SR strictly eliminates boundary seams, achieves superior perceptual quality, and significantly boosts performance in downstream semantic segmentation task.
title InfScene-SR: Arbitrary-Size Image Super-Resolution via Iterative Joint-Denoising
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.19736