Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fan, Chuanliu, Cao, Ziqiang, Ma, Zicheng, Yu, Nan, Peng, Yimin, Zhang, Jun, Gao, Yiqin, Fu, Guohong
Format:	Preprint
Published:	2025
Subjects:	Computational Engineering, Finance, and Science Machine Learning
Online Access:	https://arxiv.org/abs/2502.19794
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912250089963520
author	Fan, Chuanliu Cao, Ziqiang Ma, Zicheng Yu, Nan Peng, Yimin Zhang, Jun Gao, Yiqin Fu, Guohong
author_facet	Fan, Chuanliu Cao, Ziqiang Ma, Zicheng Yu, Nan Peng, Yimin Zhang, Jun Gao, Yiqin Fu, Guohong
contents	Goal-oriented de novo molecule design, namely generating molecules with specific property or substructure constraints, is a crucial yet challenging task in drug discovery. Existing methods, such as Bayesian optimization and reinforcement learning, often require training multiple property predictors and struggle to incorporate substructure constraints. Inspired by the success of Large Language Models (LLMs) in text generation, we propose ChatMol, a novel approach that leverages LLMs for molecule design across diverse constraint settings. Initially, we crafted a molecule representation compatible with LLMs and validated its efficacy across multiple online LLMs. Afterwards, we developed specific prompts geared towards diverse constrained molecule generation tasks to further fine-tune current LLMs while integrating feedback learning derived from property prediction. Finally, to address the limitations of LLMs in numerical recognition, we referred to the position encoding method and incorporated additional encoding for numerical values within the prompt. Experimental results across single-property, substructure-property, and multi-property constrained tasks demonstrate that ChatMol consistently outperforms state-of-the-art baselines, including VAE and RL-based methods. Notably, in multi-objective binding affinity maximization task, ChatMol achieves a significantly lower KD value of 0.25 for the protein target ESR1, while maintaining the highest overall performance, surpassing previous methods by 4.76%. Meanwhile, with numerical enhancement, the Pearson correlation coefficient between the instructed property values and those of the generated molecules increased by up to 0.49. These findings highlight the potential of LLMs as a versatile framework for molecule generation, offering a promising alternative to traditional latent space and RL-based approaches.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_19794
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model Fan, Chuanliu Cao, Ziqiang Ma, Zicheng Yu, Nan Peng, Yimin Zhang, Jun Gao, Yiqin Fu, Guohong Computational Engineering, Finance, and Science Machine Learning Goal-oriented de novo molecule design, namely generating molecules with specific property or substructure constraints, is a crucial yet challenging task in drug discovery. Existing methods, such as Bayesian optimization and reinforcement learning, often require training multiple property predictors and struggle to incorporate substructure constraints. Inspired by the success of Large Language Models (LLMs) in text generation, we propose ChatMol, a novel approach that leverages LLMs for molecule design across diverse constraint settings. Initially, we crafted a molecule representation compatible with LLMs and validated its efficacy across multiple online LLMs. Afterwards, we developed specific prompts geared towards diverse constrained molecule generation tasks to further fine-tune current LLMs while integrating feedback learning derived from property prediction. Finally, to address the limitations of LLMs in numerical recognition, we referred to the position encoding method and incorporated additional encoding for numerical values within the prompt. Experimental results across single-property, substructure-property, and multi-property constrained tasks demonstrate that ChatMol consistently outperforms state-of-the-art baselines, including VAE and RL-based methods. Notably, in multi-objective binding affinity maximization task, ChatMol achieves a significantly lower KD value of 0.25 for the protein target ESR1, while maintaining the highest overall performance, surpassing previous methods by 4.76%. Meanwhile, with numerical enhancement, the Pearson correlation coefficient between the instructed property values and those of the generated molecules increased by up to 0.49. These findings highlight the potential of LLMs as a versatile framework for molecule generation, offering a promising alternative to traditional latent space and RL-based approaches.
title	ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model
topic	Computational Engineering, Finance, and Science Machine Learning
url	https://arxiv.org/abs/2502.19794

Similar Items