Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Doudou, Tong, Han, Wang, Linshanshan, Liu, Suqi, Xiong, Xin, Gan, Ziming, Griffier, Romain, Hejblum, Boris, Liu, Yun-Chung, Hong, Chuan, Bonzel, Clara-Lea, Cai, Tianrun, Pan, Kevin, Ho, Yuk-Lam, Costa, Lauren, Panickan, Vidul A., Gaziano, J. Michael, Mandl, Kenneth, Jouhet, Vianney, Thiebaut, Rodolphe, Xia, Zongqi, Cho, Kelly, Liao, Katherine, Cai, Tianxi
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.08547
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911565678116864
author	Zhou, Doudou Tong, Han Wang, Linshanshan Liu, Suqi Xiong, Xin Gan, Ziming Griffier, Romain Hejblum, Boris Liu, Yun-Chung Hong, Chuan Bonzel, Clara-Lea Cai, Tianrun Pan, Kevin Ho, Yuk-Lam Costa, Lauren Panickan, Vidul A. Gaziano, J. Michael Mandl, Kenneth Jouhet, Vianney Thiebaut, Rodolphe Xia, Zongqi Cho, Kelly Liao, Katherine Cai, Tianxi
author_facet	Zhou, Doudou Tong, Han Wang, Linshanshan Liu, Suqi Xiong, Xin Gan, Ziming Griffier, Romain Hejblum, Boris Liu, Yun-Chung Hong, Chuan Bonzel, Clara-Lea Cai, Tianrun Pan, Kevin Ho, Yuk-Lam Costa, Lauren Panickan, Vidul A. Gaziano, J. Michael Mandl, Kenneth Jouhet, Vianney Thiebaut, Rodolphe Xia, Zongqi Cho, Kelly Liao, Katherine Cai, Tianxi
contents	The widespread adoption of electronic health records has created new opportunities for translational clinical research, yet this promise remains constrained by fragmented data across privacy-siloed institutions and substantial heterogeneity in local coding practices. While privacy-preserving collaborative learning allows institutions to work together without sharing patient-level data, it does not address inconsistencies in how clinical concepts are represented across sites. We introduce a graph-based framework that addresses this gap by treating data harmonization as a scalable representation learning problem. Rather than relying on fixed standards or manual mappings, the framework integrates institution-specific summary statistics from health records, curated biomedical knowledge graphs, and semantic information derived from large language models to learn a shared semantic space. This joint learning approach aligns diverse, site-specific vocabularies while preserving patient privacy. Evaluated across seven institutions and two languages, the framework provides a robust, data-centric foundation for training and deploying clinical models across heterogeneous healthcare systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_08547
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Representation learning to advance multi-institutional studies with electronic health record data from US and France Zhou, Doudou Tong, Han Wang, Linshanshan Liu, Suqi Xiong, Xin Gan, Ziming Griffier, Romain Hejblum, Boris Liu, Yun-Chung Hong, Chuan Bonzel, Clara-Lea Cai, Tianrun Pan, Kevin Ho, Yuk-Lam Costa, Lauren Panickan, Vidul A. Gaziano, J. Michael Mandl, Kenneth Jouhet, Vianney Thiebaut, Rodolphe Xia, Zongqi Cho, Kelly Liao, Katherine Cai, Tianxi Artificial Intelligence The widespread adoption of electronic health records has created new opportunities for translational clinical research, yet this promise remains constrained by fragmented data across privacy-siloed institutions and substantial heterogeneity in local coding practices. While privacy-preserving collaborative learning allows institutions to work together without sharing patient-level data, it does not address inconsistencies in how clinical concepts are represented across sites. We introduce a graph-based framework that addresses this gap by treating data harmonization as a scalable representation learning problem. Rather than relying on fixed standards or manual mappings, the framework integrates institution-specific summary statistics from health records, curated biomedical knowledge graphs, and semantic information derived from large language models to learn a shared semantic space. This joint learning approach aligns diverse, site-specific vocabularies while preserving patient privacy. Evaluated across seven institutions and two languages, the framework provides a robust, data-centric foundation for training and deploying clinical models across heterogeneous healthcare systems.
title	Representation learning to advance multi-institutional studies with electronic health record data from US and France
topic	Artificial Intelligence
url	https://arxiv.org/abs/2502.08547

Similar Items