Enregistré dans:
Détails bibliographiques
Auteurs principaux: Ozer, Derin, Lamprier, Sylvain, Cauchy, Thomas, Gutowski, Nicolas, Da Mota, Benoit
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2503.05810
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866912602001506304
author Ozer, Derin
Lamprier, Sylvain
Cauchy, Thomas
Gutowski, Nicolas
Da Mota, Benoit
author_facet Ozer, Derin
Lamprier, Sylvain
Cauchy, Thomas
Gutowski, Nicolas
Da Mota, Benoit
contents The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these, this work proposes the Broad Reaction Set (BRS), a set featuring 20 generic reaction templates written in SMARTS, a pattern-based notation designed to describe substructures and reactivity. Additionally, we introduce ProPreT5, a T5-based model specifically adapted for chemistry and, to the best of our knowledge, the first language model capable of directly handling and applying SMARTS reaction templates. To further improve generalization, we propose the first augmentation strategy for SMARTS, which injects structural diversity at the pattern level. Trained on augmented templates, ProPreT5 demonstrates strong predictive performance and generalization to unseen reactions. Together, these contributions provide a novel and practical alternative to current methods, advancing the field of template-based reaction prediction.
format Preprint
id arxiv_https___arxiv_org_abs_2503_05810
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Transformer Model for Predicting Chemical Products from Generic SMARTS Templates with Data Augmentation
Ozer, Derin
Lamprier, Sylvain
Cauchy, Thomas
Gutowski, Nicolas
Da Mota, Benoit
Machine Learning
Artificial Intelligence
Chemical Physics
The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these, this work proposes the Broad Reaction Set (BRS), a set featuring 20 generic reaction templates written in SMARTS, a pattern-based notation designed to describe substructures and reactivity. Additionally, we introduce ProPreT5, a T5-based model specifically adapted for chemistry and, to the best of our knowledge, the first language model capable of directly handling and applying SMARTS reaction templates. To further improve generalization, we propose the first augmentation strategy for SMARTS, which injects structural diversity at the pattern level. Trained on augmented templates, ProPreT5 demonstrates strong predictive performance and generalization to unseen reactions. Together, these contributions provide a novel and practical alternative to current methods, advancing the field of template-based reaction prediction.
title A Transformer Model for Predicting Chemical Products from Generic SMARTS Templates with Data Augmentation
topic Machine Learning
Artificial Intelligence
Chemical Physics
url https://arxiv.org/abs/2503.05810