Saved in:
Bibliographic Details
Main Authors: Ribeiro, Rafael Bicudo, Cezar, Henrique Musseli
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.14978
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908433108697088
author Ribeiro, Rafael Bicudo
Cezar, Henrique Musseli
author_facet Ribeiro, Rafael Bicudo
Cezar, Henrique Musseli
contents Clustering techniques are consolidated as a powerful strategy for analyzing the extensive data generated from molecular modeling. In particular, some tools have been developed to cluster configurations from classical simulations with a standard focus on individual units, ranging from small molecules to complex proteins. Since the standard approach includes computing the Root Mean Square Deviation (RMSD) of atomic positions, accounting for the permutation between atoms is crucial for optimizing the clustering procedure in the presence of identical molecules. To address this issue, we present the clusttraj program, a solvent-informed clustering package that fixes inflated RMSD values by finding the optimal pairing between configurations. The program combines reordering schemes with the Kabsch algorithm to minimize the RMSD of molecular configurations before running a hierarchical clustering protocol. By considering evaluation metrics, one can determine the ideal threshold in an automated fashion and compare the different linkage schemes available. The program capabilities are exemplified by considering solute-solvent systems ranging from pure water clusters to a solvated protein or a small solute in different solvents. As a result, we investigate the dependence on different parameters, such as the system size and reordering method, and also the representativeness of the cluster medoids for the characterization of optical properties. clusttraj is implemented as a Python library and can be employed to cluster generic ensembles of molecular configurations that go beyond solute-solvent systems.
format Preprint
id arxiv_https___arxiv_org_abs_2504_14978
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle clusttraj: A Solvent-Informed Clustering Tool for Molecular Modeling
Ribeiro, Rafael Bicudo
Cezar, Henrique Musseli
Computational Physics
Clustering techniques are consolidated as a powerful strategy for analyzing the extensive data generated from molecular modeling. In particular, some tools have been developed to cluster configurations from classical simulations with a standard focus on individual units, ranging from small molecules to complex proteins. Since the standard approach includes computing the Root Mean Square Deviation (RMSD) of atomic positions, accounting for the permutation between atoms is crucial for optimizing the clustering procedure in the presence of identical molecules. To address this issue, we present the clusttraj program, a solvent-informed clustering package that fixes inflated RMSD values by finding the optimal pairing between configurations. The program combines reordering schemes with the Kabsch algorithm to minimize the RMSD of molecular configurations before running a hierarchical clustering protocol. By considering evaluation metrics, one can determine the ideal threshold in an automated fashion and compare the different linkage schemes available. The program capabilities are exemplified by considering solute-solvent systems ranging from pure water clusters to a solvated protein or a small solute in different solvents. As a result, we investigate the dependence on different parameters, such as the system size and reordering method, and also the representativeness of the cluster medoids for the characterization of optical properties. clusttraj is implemented as a Python library and can be employed to cluster generic ensembles of molecular configurations that go beyond solute-solvent systems.
title clusttraj: A Solvent-Informed Clustering Tool for Molecular Modeling
topic Computational Physics
url https://arxiv.org/abs/2504.14978