:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Eugenia, Tanase, Ioana, Mallon, Christina
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2605.12702
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
by: Vaccaro, Michelle, et al.
Published: (2026)

Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews
by: Martinez-Maldonado, Roberto, et al.
Published: (2026)

Improving Ontology Requirements Engineering with OntoChat and Participatory Prompting
by: Zhao, Yihang, et al.
Published: (2024)

Harmful Traits of AI Companions
by: Knox, W. Bradley, et al.
Published: (2025)

A Scalable Framework for Evaluating Health Language Models
by: Mallinar, Neil, et al.
Published: (2025)

Creating Disability Story Videos with Generative AI: Motivation, Expression, and Sharing
by: Niu, Shuo, et al.
Published: (2026)

Participatory provenance as representational auditing for AI-mediated public consultation
by: Mahajan, Sachit
Published: (2026)

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
by: Ford, Casey, et al.
Published: (2026)

Channelling, Coordinating, Collaborating: A Three-Layer Framework for Disability-Centered Human-Agent Collaboration
by: Xiao, Lan, et al.
Published: (2026)

PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation
by: Yang, Yu, et al.
Published: (2026)

A Multi-Agent Large Language Model Framework for Automated Qualitative Analysis
by: Xu, Qidi, et al.
Published: (2025)

Underspecified Human Decision Experiments Considered Harmful
by: Hullman, Jessica, et al.
Published: (2024)

Evalet: Evaluating Large Language Models through Functional Fragmentation
by: Kim, Tae Soo, et al.
Published: (2025)

Trust in Vision-Language Models: Insights from a Participatory User Workshop
by: Chiatti, Agnese, et al.
Published: (2025)

The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers
by: Prather, James, et al.
Published: (2024)

EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
by: Kim, Tae Soo, et al.
Published: (2023)

A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness
by: Winslow, Brent, et al.
Published: (2025)

AI Mismatches: Identifying Potential Algorithmic Harms Before AI Development
by: Saxena, Devansh, et al.
Published: (2025)

Agentic AI Framework for Individuals with Disabilities and Neurodivergence: A Multi-Agent System for Healthy Eating, Daily Routines, and Inclusive Well-Being
by: Jan, Salman, et al.
Published: (2025)

Data-Driven and Participatory Approaches toward Neuro-Inclusive AI
by: Rizvi, Naba
Published: (2025)

LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models
by: Sun, Chongyan, et al.
Published: (2024)

AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression
by: Yoo, Dong Whi, et al.
Published: (2025)

Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing
by: Jafari, Kiana, et al.
Published: (2026)

EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models
by: Xiong, Wei, et al.
Published: (2025)

Rewriting Conversational Utterances with Instructed Large Language Models
by: Galimzhanova, Elnara, et al.
Published: (2024)

AI Meets the Classroom: When Do Large Language Models Harm Learning?
by: Lehmann, Matthias, et al.
Published: (2024)

Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions
by: Wang, Kaixuan, et al.
Published: (2025)

A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models
by: Huang, Zhongzhan, et al.
Published: (2025)

Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms
by: Rismani, Shalaleh, et al.
Published: (2025)

From Fake Perfects to Conversational Imperfects: Exploring Image-Generative AI as a Boundary Object for Participatory Design of Public Spaces
by: Guridi, Jose A., et al.
Published: (2024)

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents
by: Ai, Kuangshi, et al.
Published: (2026)

Benchmarking Large Language Models for Diagnosing Students' Cognitive Skills from Handwritten Math Work
by: Kim, Yoonsu, et al.
Published: (2025)

Cripping AI: Reimagining AI Through Lived Disability Experiences
by: Tang, Xinru, et al.
Published: (2026)

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
by: Chiu, Yu Ying, et al.
Published: (2025)

Beyond Final Answers: Evaluating Large Language Models for Math Tutoring
by: Gupta, Adit, et al.
Published: (2025)

Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics
by: Palani, Srishti, et al.
Published: (2026)

An Empirical Examination of the Evaluative AI Framework
by: Kornowicz, Jaroslaw
Published: (2024)

VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
by: Sung, Yoo Yeon, et al.
Published: (2025)

Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South
by: Rastogi, Charvi, et al.
Published: (2026)

Disability data futures: Achievable imaginaries for AI and disability data justice
by: Newman-Griffis, Denis, et al.
Published: (2024)