Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ha, Sumin, Kim, Jun Hyeong, Piao, Yinhua, Kim, Sun
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language Artificial Intelligence Atomic Physics
Online-Zugang:	https://arxiv.org/abs/2503.04780
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866912263314604032
author	Ha, Sumin Kim, Jun Hyeong Piao, Yinhua Kim, Sun
author_facet	Ha, Sumin Kim, Jun Hyeong Piao, Yinhua Kim, Sun
contents	Human expertise in chemistry and biomedicine relies on contextual molecular understanding, a capability that large language models (LLMs) can extend through fine-grained alignment between molecular structures and text. Recent multimodal learning advances focus on cross-modal alignment, but existing molecule-text models ignore complementary information in different molecular views and rely on single-view representations, limiting molecular understanding. Moreover, naïve multi-view alignment strategies face two challenges: (1) separate aligned spaces with inconsistent mappings between molecule and text embeddings, and that (2) existing loss objectives fail to preserve complementary information for fine-grained alignment. This can limit the LLM's ability to fully understand the molecular properties. To address these issues, we propose MV-CLAM, a novel framework that aligns multi-view molecular representations into a unified textual space using a multi-query transformer (MQ-Former). Our approach ensures cross-view consistency while a token-level contrastive loss preserves diverse molecular features across textual queries. MV-CLAM enhances molecular reasoning, improving retrieval and captioning accuracy. The source code of MV-CLAM is available in https://github.com/sumin124/mv-clam.git.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_04780
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model Ha, Sumin Kim, Jun Hyeong Piao, Yinhua Kim, Sun Computation and Language Artificial Intelligence Atomic Physics Human expertise in chemistry and biomedicine relies on contextual molecular understanding, a capability that large language models (LLMs) can extend through fine-grained alignment between molecular structures and text. Recent multimodal learning advances focus on cross-modal alignment, but existing molecule-text models ignore complementary information in different molecular views and rely on single-view representations, limiting molecular understanding. Moreover, naïve multi-view alignment strategies face two challenges: (1) separate aligned spaces with inconsistent mappings between molecule and text embeddings, and that (2) existing loss objectives fail to preserve complementary information for fine-grained alignment. This can limit the LLM's ability to fully understand the molecular properties. To address these issues, we propose MV-CLAM, a novel framework that aligns multi-view molecular representations into a unified textual space using a multi-query transformer (MQ-Former). Our approach ensures cross-view consistency while a token-level contrastive loss preserves diverse molecular features across textual queries. MV-CLAM enhances molecular reasoning, improving retrieval and captioning accuracy. The source code of MV-CLAM is available in https://github.com/sumin124/mv-clam.git.
title	MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model
topic	Computation and Language Artificial Intelligence Atomic Physics
url	https://arxiv.org/abs/2503.04780

Ähnliche Einträge