Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.08547 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911565678116864 |
|---|---|
| author | Zhou, Doudou Tong, Han Wang, Linshanshan Liu, Suqi Xiong, Xin Gan, Ziming Griffier, Romain Hejblum, Boris Liu, Yun-Chung Hong, Chuan Bonzel, Clara-Lea Cai, Tianrun Pan, Kevin Ho, Yuk-Lam Costa, Lauren Panickan, Vidul A. Gaziano, J. Michael Mandl, Kenneth Jouhet, Vianney Thiebaut, Rodolphe Xia, Zongqi Cho, Kelly Liao, Katherine Cai, Tianxi |
| author_facet | Zhou, Doudou Tong, Han Wang, Linshanshan Liu, Suqi Xiong, Xin Gan, Ziming Griffier, Romain Hejblum, Boris Liu, Yun-Chung Hong, Chuan Bonzel, Clara-Lea Cai, Tianrun Pan, Kevin Ho, Yuk-Lam Costa, Lauren Panickan, Vidul A. Gaziano, J. Michael Mandl, Kenneth Jouhet, Vianney Thiebaut, Rodolphe Xia, Zongqi Cho, Kelly Liao, Katherine Cai, Tianxi |
| contents | The widespread adoption of electronic health records has created new opportunities for translational clinical research, yet this promise remains constrained by fragmented data across privacy-siloed institutions and substantial heterogeneity in local coding practices. While privacy-preserving collaborative learning allows institutions to work together without sharing patient-level data, it does not address inconsistencies in how clinical concepts are represented across sites. We introduce a graph-based framework that addresses this gap by treating data harmonization as a scalable representation learning problem. Rather than relying on fixed standards or manual mappings, the framework integrates institution-specific summary statistics from health records, curated biomedical knowledge graphs, and semantic information derived from large language models to learn a shared semantic space. This joint learning approach aligns diverse, site-specific vocabularies while preserving patient privacy. Evaluated across seven institutions and two languages, the framework provides a robust, data-centric foundation for training and deploying clinical models across heterogeneous healthcare systems. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_08547 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Representation learning to advance multi-institutional studies with electronic health record data from US and France Zhou, Doudou Tong, Han Wang, Linshanshan Liu, Suqi Xiong, Xin Gan, Ziming Griffier, Romain Hejblum, Boris Liu, Yun-Chung Hong, Chuan Bonzel, Clara-Lea Cai, Tianrun Pan, Kevin Ho, Yuk-Lam Costa, Lauren Panickan, Vidul A. Gaziano, J. Michael Mandl, Kenneth Jouhet, Vianney Thiebaut, Rodolphe Xia, Zongqi Cho, Kelly Liao, Katherine Cai, Tianxi Artificial Intelligence The widespread adoption of electronic health records has created new opportunities for translational clinical research, yet this promise remains constrained by fragmented data across privacy-siloed institutions and substantial heterogeneity in local coding practices. While privacy-preserving collaborative learning allows institutions to work together without sharing patient-level data, it does not address inconsistencies in how clinical concepts are represented across sites. We introduce a graph-based framework that addresses this gap by treating data harmonization as a scalable representation learning problem. Rather than relying on fixed standards or manual mappings, the framework integrates institution-specific summary statistics from health records, curated biomedical knowledge graphs, and semantic information derived from large language models to learn a shared semantic space. This joint learning approach aligns diverse, site-specific vocabularies while preserving patient privacy. Evaluated across seven institutions and two languages, the framework provides a robust, data-centric foundation for training and deploying clinical models across heterogeneous healthcare systems. |
| title | Representation learning to advance multi-institutional studies with electronic health record data from US and France |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2502.08547 |