Enregistré dans:
Détails bibliographiques
Auteurs principaux: Young, Sam, Jwa, Yeon-jae, Terao, Kazuhiro
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2502.02558
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866914385325195264
author Young, Sam
Jwa, Yeon-jae
Terao, Kazuhiro
author_facet Young, Sam
Jwa, Yeon-jae
Terao, Kazuhiro
contents Effective self-supervised learning (SSL) techniques have been key to unlocking large datasets for representation learning. While many promising methods have been developed using online corpora and captioned photographs, their application to scientific domains, where data encodes highly specialized knowledge, remains a challenge. Liquid Argon Time Projection Chambers (LArTPCs) provide high-resolution 3D imaging for fundamental physics, but analysis of their sparse, complex point cloud data often relies on supervised methods trained on large simulations, introducing potential biases. We introduce the Point-based Liquid Argon Masked Autoencoder (PoLAr-MAE), applying masked point modeling to unlabeled LArTPC images using domain-specific volumetric tokenization and energy prediction. We show this SSL approach learns physically meaningful trajectory representations directly from data. This yields remarkable data efficiency: fine-tuning on just 100 labeled events achieves track/shower semantic segmentation performance comparable to the state-of-the-art supervised baseline trained on $>$100,000 events. Furthermore, internal attention maps exhibit emergent instance segmentation of particle trajectories. While challenges remain, particularly for fine-grained features, we make concrete SSL's potential for building a foundation model for LArTPC image analysis capable of serving as a common base for all data reconstruction tasks. To facilitate further progress, we release PILArNet-M, a large dataset of 1M LArTPC events. Project site: https://youngsm.com/polarmae.
format Preprint
id arxiv_https___arxiv_org_abs_2502_02558
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Particle Trajectory Representation Learning with Masked Point Modeling
Young, Sam
Jwa, Yeon-jae
Terao, Kazuhiro
High Energy Physics - Experiment
Computer Vision and Pattern Recognition
Machine Learning
Effective self-supervised learning (SSL) techniques have been key to unlocking large datasets for representation learning. While many promising methods have been developed using online corpora and captioned photographs, their application to scientific domains, where data encodes highly specialized knowledge, remains a challenge. Liquid Argon Time Projection Chambers (LArTPCs) provide high-resolution 3D imaging for fundamental physics, but analysis of their sparse, complex point cloud data often relies on supervised methods trained on large simulations, introducing potential biases. We introduce the Point-based Liquid Argon Masked Autoencoder (PoLAr-MAE), applying masked point modeling to unlabeled LArTPC images using domain-specific volumetric tokenization and energy prediction. We show this SSL approach learns physically meaningful trajectory representations directly from data. This yields remarkable data efficiency: fine-tuning on just 100 labeled events achieves track/shower semantic segmentation performance comparable to the state-of-the-art supervised baseline trained on $>$100,000 events. Furthermore, internal attention maps exhibit emergent instance segmentation of particle trajectories. While challenges remain, particularly for fine-grained features, we make concrete SSL's potential for building a foundation model for LArTPC image analysis capable of serving as a common base for all data reconstruction tasks. To facilitate further progress, we release PILArNet-M, a large dataset of 1M LArTPC events. Project site: https://youngsm.com/polarmae.
title Particle Trajectory Representation Learning with Masked Point Modeling
topic High Energy Physics - Experiment
Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2502.02558