Saved in:
| Main Authors: | Kim, Eugenia, Tanase, Ioana, Mallon, Christina |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.12702 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
by: Vaccaro, Michelle, et al.
Published: (2026)
by: Vaccaro, Michelle, et al.
Published: (2026)
Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews
by: Martinez-Maldonado, Roberto, et al.
Published: (2026)
by: Martinez-Maldonado, Roberto, et al.
Published: (2026)
Improving Ontology Requirements Engineering with OntoChat and Participatory Prompting
by: Zhao, Yihang, et al.
Published: (2024)
by: Zhao, Yihang, et al.
Published: (2024)
Harmful Traits of AI Companions
by: Knox, W. Bradley, et al.
Published: (2025)
by: Knox, W. Bradley, et al.
Published: (2025)
A Scalable Framework for Evaluating Health Language Models
by: Mallinar, Neil, et al.
Published: (2025)
by: Mallinar, Neil, et al.
Published: (2025)
Creating Disability Story Videos with Generative AI: Motivation, Expression, and Sharing
by: Niu, Shuo, et al.
Published: (2026)
by: Niu, Shuo, et al.
Published: (2026)
Participatory provenance as representational auditing for AI-mediated public consultation
by: Mahajan, Sachit
Published: (2026)
by: Mahajan, Sachit
Published: (2026)
Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
by: Ford, Casey, et al.
Published: (2026)
by: Ford, Casey, et al.
Published: (2026)
Channelling, Coordinating, Collaborating: A Three-Layer Framework for Disability-Centered Human-Agent Collaboration
by: Xiao, Lan, et al.
Published: (2026)
by: Xiao, Lan, et al.
Published: (2026)
PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation
by: Yang, Yu, et al.
Published: (2026)
by: Yang, Yu, et al.
Published: (2026)
A Multi-Agent Large Language Model Framework for Automated Qualitative Analysis
by: Xu, Qidi, et al.
Published: (2025)
by: Xu, Qidi, et al.
Published: (2025)
Underspecified Human Decision Experiments Considered Harmful
by: Hullman, Jessica, et al.
Published: (2024)
by: Hullman, Jessica, et al.
Published: (2024)
Evalet: Evaluating Large Language Models through Functional Fragmentation
by: Kim, Tae Soo, et al.
Published: (2025)
by: Kim, Tae Soo, et al.
Published: (2025)
Trust in Vision-Language Models: Insights from a Participatory User Workshop
by: Chiatti, Agnese, et al.
Published: (2025)
by: Chiatti, Agnese, et al.
Published: (2025)
The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers
by: Prather, James, et al.
Published: (2024)
by: Prather, James, et al.
Published: (2024)
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
by: Kim, Tae Soo, et al.
Published: (2023)
by: Kim, Tae Soo, et al.
Published: (2023)
A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness
by: Winslow, Brent, et al.
Published: (2025)
by: Winslow, Brent, et al.
Published: (2025)
AI Mismatches: Identifying Potential Algorithmic Harms Before AI Development
by: Saxena, Devansh, et al.
Published: (2025)
by: Saxena, Devansh, et al.
Published: (2025)
Agentic AI Framework for Individuals with Disabilities and Neurodivergence: A Multi-Agent System for Healthy Eating, Daily Routines, and Inclusive Well-Being
by: Jan, Salman, et al.
Published: (2025)
by: Jan, Salman, et al.
Published: (2025)
Data-Driven and Participatory Approaches toward Neuro-Inclusive AI
by: Rizvi, Naba
Published: (2025)
by: Rizvi, Naba
Published: (2025)
LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models
by: Sun, Chongyan, et al.
Published: (2024)
by: Sun, Chongyan, et al.
Published: (2024)
AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression
by: Yoo, Dong Whi, et al.
Published: (2025)
by: Yoo, Dong Whi, et al.
Published: (2025)
Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing
by: Jafari, Kiana, et al.
Published: (2026)
by: Jafari, Kiana, et al.
Published: (2026)
EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models
by: Xiong, Wei, et al.
Published: (2025)
by: Xiong, Wei, et al.
Published: (2025)
Rewriting Conversational Utterances with Instructed Large Language Models
by: Galimzhanova, Elnara, et al.
Published: (2024)
by: Galimzhanova, Elnara, et al.
Published: (2024)
AI Meets the Classroom: When Do Large Language Models Harm Learning?
by: Lehmann, Matthias, et al.
Published: (2024)
by: Lehmann, Matthias, et al.
Published: (2024)
Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions
by: Wang, Kaixuan, et al.
Published: (2025)
by: Wang, Kaixuan, et al.
Published: (2025)
A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models
by: Huang, Zhongzhan, et al.
Published: (2025)
by: Huang, Zhongzhan, et al.
Published: (2025)
Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms
by: Rismani, Shalaleh, et al.
Published: (2025)
by: Rismani, Shalaleh, et al.
Published: (2025)
From Fake Perfects to Conversational Imperfects: Exploring Image-Generative AI as a Boundary Object for Participatory Design of Public Spaces
by: Guridi, Jose A., et al.
Published: (2024)
by: Guridi, Jose A., et al.
Published: (2024)
SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents
by: Ai, Kuangshi, et al.
Published: (2026)
by: Ai, Kuangshi, et al.
Published: (2026)
Benchmarking Large Language Models for Diagnosing Students' Cognitive Skills from Handwritten Math Work
by: Kim, Yoonsu, et al.
Published: (2025)
by: Kim, Yoonsu, et al.
Published: (2025)
Cripping AI: Reimagining AI Through Lived Disability Experiences
by: Tang, Xinru, et al.
Published: (2026)
by: Tang, Xinru, et al.
Published: (2026)
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
by: Chiu, Yu Ying, et al.
Published: (2025)
by: Chiu, Yu Ying, et al.
Published: (2025)
Beyond Final Answers: Evaluating Large Language Models for Math Tutoring
by: Gupta, Adit, et al.
Published: (2025)
by: Gupta, Adit, et al.
Published: (2025)
Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics
by: Palani, Srishti, et al.
Published: (2026)
by: Palani, Srishti, et al.
Published: (2026)
An Empirical Examination of the Evaluative AI Framework
by: Kornowicz, Jaroslaw
Published: (2024)
by: Kornowicz, Jaroslaw
Published: (2024)
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
by: Sung, Yoo Yeon, et al.
Published: (2025)
by: Sung, Yoo Yeon, et al.
Published: (2025)
Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South
by: Rastogi, Charvi, et al.
Published: (2026)
by: Rastogi, Charvi, et al.
Published: (2026)
Disability data futures: Achievable imaginaries for AI and disability data justice
by: Newman-Griffis, Denis, et al.
Published: (2024)
by: Newman-Griffis, Denis, et al.
Published: (2024)
Similar Items
-
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
by: Vaccaro, Michelle, et al.
Published: (2026) -
Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews
by: Martinez-Maldonado, Roberto, et al.
Published: (2026) -
Improving Ontology Requirements Engineering with OntoChat and Participatory Prompting
by: Zhao, Yihang, et al.
Published: (2024) -
Harmful Traits of AI Companions
by: Knox, W. Bradley, et al.
Published: (2025) -
A Scalable Framework for Evaluating Health Language Models
by: Mallinar, Neil, et al.
Published: (2025)