Saved in:
Bibliographic Details
Main Authors: Birk, Joschka, Hallin, Anna, Kasieczka, Gregor
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.05618
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909307735375872
author Birk, Joschka
Hallin, Anna
Kasieczka, Gregor
author_facet Birk, Joschka
Hallin, Anna
Kasieczka, Gregor
contents Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-$α$ model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
format Preprint
id arxiv_https___arxiv_org_abs_2403_05618
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle OmniJet-$α$: The first cross-task foundation model for particle physics
Birk, Joschka
Hallin, Anna
Kasieczka, Gregor
High Energy Physics - Phenomenology
Machine Learning
High Energy Physics - Experiment
Data Analysis, Statistics and Probability
Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-$α$ model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
title OmniJet-$α$: The first cross-task foundation model for particle physics
topic High Energy Physics - Phenomenology
Machine Learning
High Energy Physics - Experiment
Data Analysis, Statistics and Probability
url https://arxiv.org/abs/2403.05618