Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fan, Yongda, Wu, John, Fitzpatrick, Andrea, Baskaran, Naveen, Sun, Jimeng, Cross, Adam
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.24828
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912982830678016
author	Fan, Yongda Wu, John Fitzpatrick, Andrea Baskaran, Naveen Sun, Jimeng Cross, Adam
author_facet	Fan, Yongda Wu, John Fitzpatrick, Andrea Baskaran, Naveen Sun, Jimeng Cross, Adam
contents	Clinical decisions are high-stakes and require explicit justification, making model interpretability essential for auditing deep clinical models prior to deployment. As the ecosystem of model architectures and explainability methods expands, critical questions remain: Do architectural features like attention improve explainability? Do interpretability approaches generalize across clinical tasks? While prior benchmarking efforts exist, they often lack extensibility and reproducibility, and critically, fail to systematically examine how interpretability varies across the interplay of clinical tasks and model architectures. To address these gaps, we present a comprehensive benchmark evaluating interpretability methods across diverse clinical prediction tasks and model architectures. Our analysis reveals that: (1) attention when leveraged properly is a highly efficient approach for faithfully interpreting model predictions; (2) black-box interpreters like KernelSHAP and LIME are computationally infeasible for time-series clinical prediction tasks; and (3) several interpretability approaches are too unreliable to be trustworthy. From our findings, we discuss several guidelines on improving interpretability within clinical predictive pipelines. To support reproducibility and extensibility, we provide our implementations via PyHealth, a well-documented open-source framework: https://github.com/sunlabuiuc/PyHealth.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_24828
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study Fan, Yongda Wu, John Fitzpatrick, Andrea Baskaran, Naveen Sun, Jimeng Cross, Adam Machine Learning Artificial Intelligence Clinical decisions are high-stakes and require explicit justification, making model interpretability essential for auditing deep clinical models prior to deployment. As the ecosystem of model architectures and explainability methods expands, critical questions remain: Do architectural features like attention improve explainability? Do interpretability approaches generalize across clinical tasks? While prior benchmarking efforts exist, they often lack extensibility and reproducibility, and critically, fail to systematically examine how interpretability varies across the interplay of clinical tasks and model architectures. To address these gaps, we present a comprehensive benchmark evaluating interpretability methods across diverse clinical prediction tasks and model architectures. Our analysis reveals that: (1) attention when leveraged properly is a highly efficient approach for faithfully interpreting model predictions; (2) black-box interpreters like KernelSHAP and LIME are computationally infeasible for time-series clinical prediction tasks; and (3) several interpretability approaches are too unreliable to be trustworthy. From our findings, we discuss several guidelines on improving interpretability within clinical predictive pipelines. To support reproducibility and extensibility, we provide our implementations via PyHealth, a well-documented open-source framework: https://github.com/sunlabuiuc/PyHealth.
title	A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2603.24828

Similar Items