Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Di Fruscia, Lorenzo, Weber, Jana Marie
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Machine Learning Biomolecules
Online Access:	https://arxiv.org/abs/2505.05616
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909605818269696
author	Di Fruscia, Lorenzo Weber, Jana Marie
author_facet	Di Fruscia, Lorenzo Weber, Jana Marie
contents	Predicting enzymatic reactions is crucial for applications in biocatalysis, metabolic engineering, and drug discovery, yet it remains a complex and resource-intensive task. Large Language Models (LLMs) have recently demonstrated remarkable success in various scientific domains, e.g., through their ability to generalize knowledge, reason over complex structures, and leverage in-context learning strategies. In this study, we systematically evaluate the capability of LLMs, particularly the Llama-3.1 family (8B and 70B), across three core biochemical tasks: Enzyme Commission number prediction, forward synthesis, and retrosynthesis. We compare single-task and multitask learning strategies, employing parameter-efficient fine-tuning via LoRA adapters. Additionally, we assess performance across different data regimes to explore their adaptability in low-data settings. Our results demonstrate that fine-tuned LLMs capture biochemical knowledge, with multitask learning enhancing forward- and retrosynthesis predictions by leveraging shared enzymatic information. We also identify key limitations, for example challenges in hierarchical EC classification schemes, highlighting areas for further improvement in LLM-driven biochemical modeling.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_05616
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Leveraging Large Language Models for enzymatic reaction prediction and characterization Di Fruscia, Lorenzo Weber, Jana Marie Artificial Intelligence Machine Learning Biomolecules Predicting enzymatic reactions is crucial for applications in biocatalysis, metabolic engineering, and drug discovery, yet it remains a complex and resource-intensive task. Large Language Models (LLMs) have recently demonstrated remarkable success in various scientific domains, e.g., through their ability to generalize knowledge, reason over complex structures, and leverage in-context learning strategies. In this study, we systematically evaluate the capability of LLMs, particularly the Llama-3.1 family (8B and 70B), across three core biochemical tasks: Enzyme Commission number prediction, forward synthesis, and retrosynthesis. We compare single-task and multitask learning strategies, employing parameter-efficient fine-tuning via LoRA adapters. Additionally, we assess performance across different data regimes to explore their adaptability in low-data settings. Our results demonstrate that fine-tuned LLMs capture biochemical knowledge, with multitask learning enhancing forward- and retrosynthesis predictions by leveraging shared enzymatic information. We also identify key limitations, for example challenges in hierarchical EC classification schemes, highlighting areas for further improvement in LLM-driven biochemical modeling.
title	Leveraging Large Language Models for enzymatic reaction prediction and characterization
topic	Artificial Intelligence Machine Learning Biomolecules
url	https://arxiv.org/abs/2505.05616

Similar Items