Saved in:
Bibliographic Details
Main Authors: Çoğalan, Uğur, Bemana, Mojtaba, Myszkowski, Karol, Seidel, Hans-Peter, Groth, Colin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.01411
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912563864797184
author Çoğalan, Uğur
Bemana, Mojtaba
Myszkowski, Karol
Seidel, Hans-Peter
Groth, Colin
author_facet Çoğalan, Uğur
Bemana, Mojtaba
Myszkowski, Karol
Seidel, Hans-Peter
Groth, Colin
contents We present MILO (Metric for Image- and Latent-space Optimization), a lightweight, multiscale, perceptual metric for full-reference image quality assessment (FR-IQA). MILO is trained using pseudo-MOS (Mean Opinion Score) supervision, in which reproducible distortions are applied to diverse images and scored via an ensemble of recent quality metrics that account for visual masking effects. This approach enables accurate learning without requiring large-scale human-labeled datasets. Despite its compact architecture, MILO outperforms existing metrics across standard FR-IQA benchmarks and offers fast inference suitable for real-time applications. Beyond quality prediction, we demonstrate the utility of MILO as a perceptual loss in both image and latent domains. In particular, we show that spatial masking modeled by MILO, when applied to latent representations from a VAE encoder within Stable Diffusion, enables efficient and perceptually aligned optimization. By combining spatial masking with a curriculum learning strategy, we first process perceptually less relevant regions before progressively shifting the optimization to more visually distorted areas. This strategy leads to significantly improved performance in tasks like denoising, super-resolution, and face restoration, while also reducing computational overhead. MILO thus functions as both a state-of-the-art image quality metric and as a practical tool for perceptual optimization in generative pipelines.
format Preprint
id arxiv_https___arxiv_org_abs_2509_01411
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MILO: A Lightweight Perceptual Quality Metric for Image and Latent-Space Optimization
Çoğalan, Uğur
Bemana, Mojtaba
Myszkowski, Karol
Seidel, Hans-Peter
Groth, Colin
Computer Vision and Pattern Recognition
We present MILO (Metric for Image- and Latent-space Optimization), a lightweight, multiscale, perceptual metric for full-reference image quality assessment (FR-IQA). MILO is trained using pseudo-MOS (Mean Opinion Score) supervision, in which reproducible distortions are applied to diverse images and scored via an ensemble of recent quality metrics that account for visual masking effects. This approach enables accurate learning without requiring large-scale human-labeled datasets. Despite its compact architecture, MILO outperforms existing metrics across standard FR-IQA benchmarks and offers fast inference suitable for real-time applications. Beyond quality prediction, we demonstrate the utility of MILO as a perceptual loss in both image and latent domains. In particular, we show that spatial masking modeled by MILO, when applied to latent representations from a VAE encoder within Stable Diffusion, enables efficient and perceptually aligned optimization. By combining spatial masking with a curriculum learning strategy, we first process perceptually less relevant regions before progressively shifting the optimization to more visually distorted areas. This strategy leads to significantly improved performance in tasks like denoising, super-resolution, and face restoration, while also reducing computational overhead. MILO thus functions as both a state-of-the-art image quality metric and as a practical tool for perceptual optimization in generative pipelines.
title MILO: A Lightweight Perceptual Quality Metric for Image and Latent-Space Optimization
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2509.01411