Guardado en:
Detalles Bibliográficos
Autores principales: DeMarco, Andrea, Conti, Ian Fenech, Camilleri, Hayley, Bushi, Ardiana, Riggi, Simone
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2603.29660
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917391092416512
author DeMarco, Andrea
Conti, Ian Fenech
Camilleri, Hayley
Bushi, Ardiana
Riggi, Simone
author_facet DeMarco, Andrea
Conti, Ian Fenech
Camilleri, Hayley
Bushi, Ardiana
Riggi, Simone
contents Next-generation radio astronomy surveys are delivering millions of resolved sources, but robust and scalable morphology analysis remains difficult across heterogeneous telescopes and imaging pipelines. We present STRADAViT, a self-supervised Vision Transformer continued-pretraining framework for learning transferable encoders from radio astronomy imagery. The framework combines mixed-survey data curation, radio astronomy-aware training-view generation, and a ViT-MAE-initialized encoder family with optional register tokens. It supports reconstruction-only, contrastive-only, and two-stage branches. Our pretraining dataset comprises radio astronomy cutouts drawn from four complementary sources. We evaluate transfer with linear probing and fine-tuning on three morphology benchmarks spanning binary and multi-class settings. Relative to the ViT-MAE initialization used for continued pretraining, the best two-stage models improve Macro-F1 in all reported linear-probe settings and in two of three fine-tuning settings, with the largest gain on RGZ DR1. Relative to DINOv2, gains are selective rather than universal: the best two-stage models achieve higher mean Macro-F1 than the strongest DINOv2 baseline on LoTSS DR2 and RGZ DR1 under linear probing, and on MiraBest and RGZ DR1 under fine-tuning. A targeted DINOv2 initialization ablation further indicates that the adaptation recipe is not specific to the ViT-MAE starting point and that, under the same recipe. The ViT-MAE-based STRADAViT checkpoint is retained as the released checkpoint because it combines competitive transfer with substantially lower token count and downstream cost than the DINOv2-based alternative. These results indicate that radio astronomy-aware view generation and staged continued pretraining can provide a stronger domain-adapted starting point than off-the-shelf ViT checkpoints for radio astronomy transfer.
format Preprint
id arxiv_https___arxiv_org_abs_2603_29660
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle STRADAViT: Towards a Foundational Model for Radio Astronomy through Self-Supervised Transfer
DeMarco, Andrea
Conti, Ian Fenech
Camilleri, Hayley
Bushi, Ardiana
Riggi, Simone
Instrumentation and Methods for Astrophysics
Computer Vision and Pattern Recognition
Next-generation radio astronomy surveys are delivering millions of resolved sources, but robust and scalable morphology analysis remains difficult across heterogeneous telescopes and imaging pipelines. We present STRADAViT, a self-supervised Vision Transformer continued-pretraining framework for learning transferable encoders from radio astronomy imagery. The framework combines mixed-survey data curation, radio astronomy-aware training-view generation, and a ViT-MAE-initialized encoder family with optional register tokens. It supports reconstruction-only, contrastive-only, and two-stage branches. Our pretraining dataset comprises radio astronomy cutouts drawn from four complementary sources. We evaluate transfer with linear probing and fine-tuning on three morphology benchmarks spanning binary and multi-class settings. Relative to the ViT-MAE initialization used for continued pretraining, the best two-stage models improve Macro-F1 in all reported linear-probe settings and in two of three fine-tuning settings, with the largest gain on RGZ DR1. Relative to DINOv2, gains are selective rather than universal: the best two-stage models achieve higher mean Macro-F1 than the strongest DINOv2 baseline on LoTSS DR2 and RGZ DR1 under linear probing, and on MiraBest and RGZ DR1 under fine-tuning. A targeted DINOv2 initialization ablation further indicates that the adaptation recipe is not specific to the ViT-MAE starting point and that, under the same recipe. The ViT-MAE-based STRADAViT checkpoint is retained as the released checkpoint because it combines competitive transfer with substantially lower token count and downstream cost than the DINOv2-based alternative. These results indicate that radio astronomy-aware view generation and staged continued pretraining can provide a stronger domain-adapted starting point than off-the-shelf ViT checkpoints for radio astronomy transfer.
title STRADAViT: Towards a Foundational Model for Radio Astronomy through Self-Supervised Transfer
topic Instrumentation and Methods for Astrophysics
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.29660