_version_ 1866908835238641664
author López-Cano, Daniel
Abramo, L. Raul
Nakazono, L.
Pérez-Ràfols, I.
Martínez-Solaeche, G.
Chaves-Montero, J.
Pieri, Matthew M.
Alcaniz, Jailson
Benitez, Narciso
Bonoli, Silvia
Carneiro, Saulo
Cenarro, Javier
Cristóbal-Hornillos, David
Daflon, Simone
Dupke, Renato
Ederoclite, Alessandro
Delgado, Rosa González
Hernán-Caballero, Antonio
Hernández-Monteagudo, Carlos
Liu, Jifeng
López-Sanjuan, Carlos
Marín-Franch, Antonio
de Oliveira, Claudia Mendes
Moles, Mariano
Roig, Fernando
Sodré Jr., Laerte
Taylor, Keith
Varela, Jesús
Ramió, Héctor Vázquez
Vilchez, Jose
Zaragoza-Cardiel, Javier
author_facet López-Cano, Daniel
Abramo, L. Raul
Nakazono, L.
Pérez-Ràfols, I.
Martínez-Solaeche, G.
Chaves-Montero, J.
Pieri, Matthew M.
Alcaniz, Jailson
Benitez, Narciso
Bonoli, Silvia
Carneiro, Saulo
Cenarro, Javier
Cristóbal-Hornillos, David
Daflon, Simone
Dupke, Renato
Ederoclite, Alessandro
Delgado, Rosa González
Hernán-Caballero, Antonio
Hernández-Monteagudo, Carlos
Liu, Jifeng
López-Sanjuan, Carlos
Marín-Franch, Antonio
de Oliveira, Claudia Mendes
Moles, Mariano
Roig, Fernando
Sodré Jr., Laerte
Taylor, Keith
Varela, Jesús
Ramió, Héctor Vázquez
Vilchez, Jose
Zaragoza-Cardiel, Javier
contents Modern studies in astrophysics and cosmology increasingly rely on simulations and cross-survey analyses, yet differences in data generation, instrumentation, calibration, and unmodeled physics introduce distribution mismatches between datasets (``domain shift''). In machine-learning pipelines, this occurs when the joint distribution of inputs and labels differs between the training (source) and application (target) domains, causing source-trained models to underperform on the target. Transfer learning and domain adaptation provide principled ways to mitigate this effect. We study a concrete simulation-to-observation case: semi-supervised domain adaptation (SSDA) to transfer a four-class spectral classifier -- high-redshift quasars, low-redshift quasars, galaxies, and stars -- from J-PAS mock catalogs based on DESI spectra to real J-PAS observations. Our pipeline pretrains on abundant labeled DESI$\rightarrow$J-PAS mocks and adapts to the target domain using a small labeled J-PAS subset. We benchmark SSDA against two baselines: a J-PAS--only supervised model trained with the same target-label budget, and a mocks-only model evaluated on held-out J-PAS data. On this held-out J-PAS data, SSDA achieves a macro-F1 score (balancing precision and recall) of $0.82$ and an overall true positive rate of $0.89$, compared to $0.79/0.85$ for the J-PAS--only baseline and $0.73/0.87$ for the mocks-only model. The gains are driven primarily by improved quasar classification, especially in the high-redshift subclass ($\mathrm{F1}=0.66$ vs.\ $0.55/0.37$), yielding better-calibrated candidate lists for spectroscopic targeting (e.g., WEAVE-QSO) and AGN searches. This study shows how modest target supervision enables robust, data-efficient simulation-to-observation transfer when simulations are plentiful but target labels are scarce.
format Preprint
id arxiv_https___arxiv_org_abs_2602_13902
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle J-PAS: Semi-Supervised Sim-to-Obs Transfer for Robust Star--Galaxy--Quasar Classification
López-Cano, Daniel
Abramo, L. Raul
Nakazono, L.
Pérez-Ràfols, I.
Martínez-Solaeche, G.
Chaves-Montero, J.
Pieri, Matthew M.
Alcaniz, Jailson
Benitez, Narciso
Bonoli, Silvia
Carneiro, Saulo
Cenarro, Javier
Cristóbal-Hornillos, David
Daflon, Simone
Dupke, Renato
Ederoclite, Alessandro
Delgado, Rosa González
Hernán-Caballero, Antonio
Hernández-Monteagudo, Carlos
Liu, Jifeng
López-Sanjuan, Carlos
Marín-Franch, Antonio
de Oliveira, Claudia Mendes
Moles, Mariano
Roig, Fernando
Sodré Jr., Laerte
Taylor, Keith
Varela, Jesús
Ramió, Héctor Vázquez
Vilchez, Jose
Zaragoza-Cardiel, Javier
Instrumentation and Methods for Astrophysics
Cosmology and Nongalactic Astrophysics
Modern studies in astrophysics and cosmology increasingly rely on simulations and cross-survey analyses, yet differences in data generation, instrumentation, calibration, and unmodeled physics introduce distribution mismatches between datasets (``domain shift''). In machine-learning pipelines, this occurs when the joint distribution of inputs and labels differs between the training (source) and application (target) domains, causing source-trained models to underperform on the target. Transfer learning and domain adaptation provide principled ways to mitigate this effect. We study a concrete simulation-to-observation case: semi-supervised domain adaptation (SSDA) to transfer a four-class spectral classifier -- high-redshift quasars, low-redshift quasars, galaxies, and stars -- from J-PAS mock catalogs based on DESI spectra to real J-PAS observations. Our pipeline pretrains on abundant labeled DESI$\rightarrow$J-PAS mocks and adapts to the target domain using a small labeled J-PAS subset. We benchmark SSDA against two baselines: a J-PAS--only supervised model trained with the same target-label budget, and a mocks-only model evaluated on held-out J-PAS data. On this held-out J-PAS data, SSDA achieves a macro-F1 score (balancing precision and recall) of $0.82$ and an overall true positive rate of $0.89$, compared to $0.79/0.85$ for the J-PAS--only baseline and $0.73/0.87$ for the mocks-only model. The gains are driven primarily by improved quasar classification, especially in the high-redshift subclass ($\mathrm{F1}=0.66$ vs.\ $0.55/0.37$), yielding better-calibrated candidate lists for spectroscopic targeting (e.g., WEAVE-QSO) and AGN searches. This study shows how modest target supervision enables robust, data-efficient simulation-to-observation transfer when simulations are plentiful but target labels are scarce.
title J-PAS: Semi-Supervised Sim-to-Obs Transfer for Robust Star--Galaxy--Quasar Classification
topic Instrumentation and Methods for Astrophysics
Cosmology and Nongalactic Astrophysics
url https://arxiv.org/abs/2602.13902