Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zelina, Petr, Řeháček, Marko, Halámková, Jana, Bohovicová, Lucia, Rusinko, Martin, Nováček, Vít
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.07385
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908759612194816
author	Zelina, Petr Řeháček, Marko Halámková, Jana Bohovicová, Lucia Rusinko, Martin Nováček, Vít
author_facet	Zelina, Petr Řeháček, Marko Halámková, Jana Bohovicová, Lucia Rusinko, Martin Nováček, Vít
contents	Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that represents each patient as a matrix built from aggregated embeddings of all their notes, enabling robust patient similarity computation based on their latent low-rank representations. Using clinical notes of 4,267 Czech breast-cancer patients and expert similarity labels from Masaryk Memorial Cancer Institute, we evaluate several matrix-based similarity measures and analyze their strengths and limitations across different similarity facets, such as clinical history, treatment, and adverse events. The results demonstrate the usefulness of the presented method for downstream tasks, such as personalized therapy recommendations or toxicity warnings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_07385
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Computing patient similarity based on unstructured clinical notes Zelina, Petr Řeháček, Marko Halámková, Jana Bohovicová, Lucia Rusinko, Martin Nováček, Vít Machine Learning Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that represents each patient as a matrix built from aggregated embeddings of all their notes, enabling robust patient similarity computation based on their latent low-rank representations. Using clinical notes of 4,267 Czech breast-cancer patients and expert similarity labels from Masaryk Memorial Cancer Institute, we evaluate several matrix-based similarity measures and analyze their strengths and limitations across different similarity facets, such as clinical history, treatment, and adverse events. The results demonstrate the usefulness of the presented method for downstream tasks, such as personalized therapy recommendations or toxicity warnings.
title	Computing patient similarity based on unstructured clinical notes
topic	Machine Learning
url	https://arxiv.org/abs/2601.07385

Similar Items