Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13416 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915739729920000 |
|---|---|
| author | Juscafresa, A. Nieto Herreros, Á. Mazcuñán Sullivan, J. |
| author_facet | Juscafresa, A. Nieto Herreros, Á. Mazcuñán Sullivan, J. |
| contents | Diffusion models have emerged as state-of-the-art generative methods for image synthesis, yet their potential as general-purpose feature encoders remains underexplored. Trained for denoising and generation without labels, they can be interpreted as self-supervised learners that capture both low- and high-level structure. We show that a frozen diffusion backbone enables strong fine-grained recognition by probing intermediate denoising features across layers and timesteps and training a linear classifier for each pair. We evaluate this in a real-world plankton-monitoring setting with practical impact, using controlled and comparable training setups against established supervised and self-supervised baselines. Frozen diffusion features are competitive with supervised baselines and outperform other self-supervised methods in both balanced and naturally long-tailed settings. Out-of-distribution evaluations on temporally and geographically shifted plankton datasets further show that frozen diffusion features maintain strong accuracy and Macro F1 under substantial distribution shift. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_13416 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Diffusion Representations for Fine-Grained Image Classification: A Marine Plankton Case Study Juscafresa, A. Nieto Herreros, Á. Mazcuñán Sullivan, J. Computer Vision and Pattern Recognition Diffusion models have emerged as state-of-the-art generative methods for image synthesis, yet their potential as general-purpose feature encoders remains underexplored. Trained for denoising and generation without labels, they can be interpreted as self-supervised learners that capture both low- and high-level structure. We show that a frozen diffusion backbone enables strong fine-grained recognition by probing intermediate denoising features across layers and timesteps and training a linear classifier for each pair. We evaluate this in a real-world plankton-monitoring setting with practical impact, using controlled and comparable training setups against established supervised and self-supervised baselines. Frozen diffusion features are competitive with supervised baselines and outperform other self-supervised methods in both balanced and naturally long-tailed settings. Out-of-distribution evaluations on temporally and geographically shifted plankton datasets further show that frozen diffusion features maintain strong accuracy and Macro F1 under substantial distribution shift. |
| title | Diffusion Representations for Fine-Grained Image Classification: A Marine Plankton Case Study |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2601.13416 |