Guardado en:
Detalles Bibliográficos
Autores principales: Arnaout, Hiba, Goel, Anmol, Schwartz, H. Andrew, Eberhardt, Steffen T., Atzil-Slonim, Dana, Doherty, Gavin, Schwartz, Brian, Lutz, Wolfgang, Althoff, Tim, De Choudhury, Munmun, Jamalabadi, Hamidreza, Shah, Raj Sanjay, Plaza-del-Arco, Flor Miriam, Hovy, Dirk, Liakata, Maria, Gurevych, Iryna
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2602.00065
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866910171504050176
author Arnaout, Hiba
Goel, Anmol
Schwartz, H. Andrew
Eberhardt, Steffen T.
Atzil-Slonim, Dana
Doherty, Gavin
Schwartz, Brian
Lutz, Wolfgang
Althoff, Tim
De Choudhury, Munmun
Jamalabadi, Hamidreza
Shah, Raj Sanjay
Plaza-del-Arco, Flor Miriam
Hovy, Dirk
Liakata, Maria
Gurevych, Iryna
author_facet Arnaout, Hiba
Goel, Anmol
Schwartz, H. Andrew
Eberhardt, Steffen T.
Atzil-Slonim, Dana
Doherty, Gavin
Schwartz, Brian
Lutz, Wolfgang
Althoff, Tim
De Choudhury, Munmun
Jamalabadi, Hamidreza
Shah, Raj Sanjay
Plaza-del-Arco, Flor Miriam
Hovy, Dirk
Liakata, Maria
Gurevych, Iryna
contents Although artificial intelligence (AI) shows growing promise for mental health care, current approaches to evaluating AI tools in this domain remain fragmented and poorly aligned with clinical practice, social context, and first-hand user experience. This paper argues for a rethinking of responsible evaluation -- what is measured, by whom, and for what purpose -- by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity, providing a structured basis for evaluation. Through an analysis of 135 recent *CL publications, we identify recurring limitations, including over-reliance on generic metrics that do not capture clinical validity, therapeutic appropriateness, or user experience, limited participation from mental health professionals, and insufficient attention to safety and equity. To address these gaps, we propose a taxonomy of AI mental health support types -- assessment-, intervention-, and information synthesis-oriented -- each with distinct risks and evaluative requirements, and illustrate its use through case studies.
format Preprint
id arxiv_https___arxiv_org_abs_2602_00065
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Responsible Evaluation of AI for Mental Health
Arnaout, Hiba
Goel, Anmol
Schwartz, H. Andrew
Eberhardt, Steffen T.
Atzil-Slonim, Dana
Doherty, Gavin
Schwartz, Brian
Lutz, Wolfgang
Althoff, Tim
De Choudhury, Munmun
Jamalabadi, Hamidreza
Shah, Raj Sanjay
Plaza-del-Arco, Flor Miriam
Hovy, Dirk
Liakata, Maria
Gurevych, Iryna
Computers and Society
Artificial Intelligence
Although artificial intelligence (AI) shows growing promise for mental health care, current approaches to evaluating AI tools in this domain remain fragmented and poorly aligned with clinical practice, social context, and first-hand user experience. This paper argues for a rethinking of responsible evaluation -- what is measured, by whom, and for what purpose -- by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity, providing a structured basis for evaluation. Through an analysis of 135 recent *CL publications, we identify recurring limitations, including over-reliance on generic metrics that do not capture clinical validity, therapeutic appropriateness, or user experience, limited participation from mental health professionals, and insufficient attention to safety and equity. To address these gaps, we propose a taxonomy of AI mental health support types -- assessment-, intervention-, and information synthesis-oriented -- each with distinct risks and evaluative requirements, and illustrate its use through case studies.
title Responsible Evaluation of AI for Mental Health
topic Computers and Society
Artificial Intelligence
url https://arxiv.org/abs/2602.00065