Enregistré dans:
| Auteurs principaux: | , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2503.05810 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866912602001506304 |
|---|---|
| author | Ozer, Derin Lamprier, Sylvain Cauchy, Thomas Gutowski, Nicolas Da Mota, Benoit |
| author_facet | Ozer, Derin Lamprier, Sylvain Cauchy, Thomas Gutowski, Nicolas Da Mota, Benoit |
| contents | The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these, this work proposes the Broad Reaction Set (BRS), a set featuring 20 generic reaction templates written in SMARTS, a pattern-based notation designed to describe substructures and reactivity. Additionally, we introduce ProPreT5, a T5-based model specifically adapted for chemistry and, to the best of our knowledge, the first language model capable of directly handling and applying SMARTS reaction templates. To further improve generalization, we propose the first augmentation strategy for SMARTS, which injects structural diversity at the pattern level. Trained on augmented templates, ProPreT5 demonstrates strong predictive performance and generalization to unseen reactions. Together, these contributions provide a novel and practical alternative to current methods, advancing the field of template-based reaction prediction. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_05810 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | A Transformer Model for Predicting Chemical Products from Generic SMARTS Templates with Data Augmentation Ozer, Derin Lamprier, Sylvain Cauchy, Thomas Gutowski, Nicolas Da Mota, Benoit Machine Learning Artificial Intelligence Chemical Physics The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these, this work proposes the Broad Reaction Set (BRS), a set featuring 20 generic reaction templates written in SMARTS, a pattern-based notation designed to describe substructures and reactivity. Additionally, we introduce ProPreT5, a T5-based model specifically adapted for chemistry and, to the best of our knowledge, the first language model capable of directly handling and applying SMARTS reaction templates. To further improve generalization, we propose the first augmentation strategy for SMARTS, which injects structural diversity at the pattern level. Trained on augmented templates, ProPreT5 demonstrates strong predictive performance and generalization to unseen reactions. Together, these contributions provide a novel and practical alternative to current methods, advancing the field of template-based reaction prediction. |
| title | A Transformer Model for Predicting Chemical Products from Generic SMARTS Templates with Data Augmentation |
| topic | Machine Learning Artificial Intelligence Chemical Physics |
| url | https://arxiv.org/abs/2503.05810 |