Saved in:
Bibliographic Details
Main Authors: Juscafresa, A. Nieto, Herreros, Á. Mazcuñán, Sullivan, J.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.13416
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915739729920000
author Juscafresa, A. Nieto
Herreros, Á. Mazcuñán
Sullivan, J.
author_facet Juscafresa, A. Nieto
Herreros, Á. Mazcuñán
Sullivan, J.
contents Diffusion models have emerged as state-of-the-art generative methods for image synthesis, yet their potential as general-purpose feature encoders remains underexplored. Trained for denoising and generation without labels, they can be interpreted as self-supervised learners that capture both low- and high-level structure. We show that a frozen diffusion backbone enables strong fine-grained recognition by probing intermediate denoising features across layers and timesteps and training a linear classifier for each pair. We evaluate this in a real-world plankton-monitoring setting with practical impact, using controlled and comparable training setups against established supervised and self-supervised baselines. Frozen diffusion features are competitive with supervised baselines and outperform other self-supervised methods in both balanced and naturally long-tailed settings. Out-of-distribution evaluations on temporally and geographically shifted plankton datasets further show that frozen diffusion features maintain strong accuracy and Macro F1 under substantial distribution shift.
format Preprint
id arxiv_https___arxiv_org_abs_2601_13416
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Diffusion Representations for Fine-Grained Image Classification: A Marine Plankton Case Study
Juscafresa, A. Nieto
Herreros, Á. Mazcuñán
Sullivan, J.
Computer Vision and Pattern Recognition
Diffusion models have emerged as state-of-the-art generative methods for image synthesis, yet their potential as general-purpose feature encoders remains underexplored. Trained for denoising and generation without labels, they can be interpreted as self-supervised learners that capture both low- and high-level structure. We show that a frozen diffusion backbone enables strong fine-grained recognition by probing intermediate denoising features across layers and timesteps and training a linear classifier for each pair. We evaluate this in a real-world plankton-monitoring setting with practical impact, using controlled and comparable training setups against established supervised and self-supervised baselines. Frozen diffusion features are competitive with supervised baselines and outperform other self-supervised methods in both balanced and naturally long-tailed settings. Out-of-distribution evaluations on temporally and geographically shifted plankton datasets further show that frozen diffusion features maintain strong accuracy and Macro F1 under substantial distribution shift.
title Diffusion Representations for Fine-Grained Image Classification: A Marine Plankton Case Study
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2601.13416