Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03027 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911565084622848 |
|---|---|
| author | Chen, Ruiyang Zhang, Qingyuan Chen, Ji |
| author_facet | Chen, Ruiyang Zhang, Qingyuan Chen, Ji |
| contents | Machine learning force field (MLFF) has emerged as a powerful data-driven tool for atomistic simulations, enabling large-scale and complex atomic systems to be simulated with accuracy comparable to \textit{ab initio} methods. However, MLFFs often suffer from low training efficiency in the phase transition regime, where structural fluctuations are significantly elevated. To address this challenge, we propose a Central-Peripheral Distillation (CPD) algorithm for training dataset distillation. By strategically integrating representative samples with critical corner cases, the CPD algorithm ensures that the distilled dataset retains maximum structural diversity. We validated the efficacy of the CPD method on the liquid-liquid phase transition of dense hydrogen. Results show that, with the CPD approach, only 200 configurations are sufficient to train a MLFF that can fully reproduce the structural and dynamical properties of liquid hydrogen in the vicinity of its phase transition regime. This work paves the way for high-fidelity labeling of the MLFF training datasets, for instance by adopting high-level \textit{ab initio} calculations beyond the standard density functional theory, thereby enhancing the predictive accuracy of MLFFs. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2604_03027 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Dataset Distillation for Machine Learning Force Field in Phase Transition Regime Chen, Ruiyang Zhang, Qingyuan Chen, Ji Chemical Physics Machine learning force field (MLFF) has emerged as a powerful data-driven tool for atomistic simulations, enabling large-scale and complex atomic systems to be simulated with accuracy comparable to \textit{ab initio} methods. However, MLFFs often suffer from low training efficiency in the phase transition regime, where structural fluctuations are significantly elevated. To address this challenge, we propose a Central-Peripheral Distillation (CPD) algorithm for training dataset distillation. By strategically integrating representative samples with critical corner cases, the CPD algorithm ensures that the distilled dataset retains maximum structural diversity. We validated the efficacy of the CPD method on the liquid-liquid phase transition of dense hydrogen. Results show that, with the CPD approach, only 200 configurations are sufficient to train a MLFF that can fully reproduce the structural and dynamical properties of liquid hydrogen in the vicinity of its phase transition regime. This work paves the way for high-fidelity labeling of the MLFF training datasets, for instance by adopting high-level \textit{ab initio} calculations beyond the standard density functional theory, thereby enhancing the predictive accuracy of MLFFs. |
| title | Dataset Distillation for Machine Learning Force Field in Phase Transition Regime |
| topic | Chemical Physics |
| url | https://arxiv.org/abs/2604.03027 |