Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.20143 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911127011590144 |
|---|---|
| author | Wang, Ruobing Tan, Qiaoyu Wang, Yili Wang, Ying Wang, Xin |
| author_facet | Wang, Ruobing Tan, Qiaoyu Wang, Yili Wang, Ying Wang, Xin |
| contents | Designing crystal materials with desired physicochemical properties remains a fundamental challenge in materials science. While large language models (LLMs) have demonstrated strong in-context learning (ICL) capabilities, existing LLM-based crystal generation approaches are limited to zero-shot scenarios and are unable to benefit from few-shot scenarios. In contrast, human experts typically design new materials by modifying relevant known structures which aligns closely with the few-shot ICL paradigm. Motivated by this, we propose CrystalICL, a novel model designed for few-shot crystal generation. Specifically, we introduce a space-group based crystal tokenization method, which effectively reduces the complexity of modeling crystal symmetry in LLMs. We further introduce a condition-structure aware hybrid instruction tuning framework and a multi-task instruction tuning strategy, enabling the model to better exploit ICL by capturing structure-property relationships from limited data. Extensive experiments on four crystal generation benchmarks demonstrate the superiority of CrystalICL over the leading baseline methods on conditional and unconditional generation tasks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2508_20143 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | CrystalICL: Enabling In-Context Learning for Crystal Generation Wang, Ruobing Tan, Qiaoyu Wang, Yili Wang, Ying Wang, Xin Machine Learning Materials Science Designing crystal materials with desired physicochemical properties remains a fundamental challenge in materials science. While large language models (LLMs) have demonstrated strong in-context learning (ICL) capabilities, existing LLM-based crystal generation approaches are limited to zero-shot scenarios and are unable to benefit from few-shot scenarios. In contrast, human experts typically design new materials by modifying relevant known structures which aligns closely with the few-shot ICL paradigm. Motivated by this, we propose CrystalICL, a novel model designed for few-shot crystal generation. Specifically, we introduce a space-group based crystal tokenization method, which effectively reduces the complexity of modeling crystal symmetry in LLMs. We further introduce a condition-structure aware hybrid instruction tuning framework and a multi-task instruction tuning strategy, enabling the model to better exploit ICL by capturing structure-property relationships from limited data. Extensive experiments on four crystal generation benchmarks demonstrate the superiority of CrystalICL over the leading baseline methods on conditional and unconditional generation tasks. |
| title | CrystalICL: Enabling In-Context Learning for Crystal Generation |
| topic | Machine Learning Materials Science |
| url | https://arxiv.org/abs/2508.20143 |