Saved in:
Bibliographic Details
Main Authors: Wang, Haining, Clark, Jason, Peña, Angelica
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.18935
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915811107536896
author Wang, Haining
Clark, Jason
Peña, Angelica
author_facet Wang, Haining
Clark, Jason
Peña, Angelica
contents As libraries explore large language models (LLMs) as a scalable layer for reference services, a core fairness question follows: can LLM-based services support all patrons fairly, regardless of demographic identity? While LLMs offer great potential for broadening access to information assistance, they may also reproduce societal biases embedded in their training data, potentially undermining libraries' commitments to impartial service. In this chapter, we apply a systematic evaluation approach that combines diagnostic classification to detect systematic differences with linguistic analysis to interpret their sources. Across three widely used open models (Llama-3.1 8B, Gemma-2 9B, and Ministral 8B), we find no compelling evidence of systematic differentiation by race/ethnicity, and only minor evidence of sex-linked differentiation in one model. We discuss implications for responsible AI adoption in libraries and the importance of ongoing monitoring in aligning LLM-based services with core professional values.
format Preprint
id arxiv_https___arxiv_org_abs_2602_18935
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Responsible Intelligence in Practice: A Fairness Audit of Open Large Language Models for Library Reference Services
Wang, Haining
Clark, Jason
Peña, Angelica
Digital Libraries
Software Engineering
As libraries explore large language models (LLMs) as a scalable layer for reference services, a core fairness question follows: can LLM-based services support all patrons fairly, regardless of demographic identity? While LLMs offer great potential for broadening access to information assistance, they may also reproduce societal biases embedded in their training data, potentially undermining libraries' commitments to impartial service. In this chapter, we apply a systematic evaluation approach that combines diagnostic classification to detect systematic differences with linguistic analysis to interpret their sources. Across three widely used open models (Llama-3.1 8B, Gemma-2 9B, and Ministral 8B), we find no compelling evidence of systematic differentiation by race/ethnicity, and only minor evidence of sex-linked differentiation in one model. We discuss implications for responsible AI adoption in libraries and the importance of ongoing monitoring in aligning LLM-based services with core professional values.
title Responsible Intelligence in Practice: A Fairness Audit of Open Large Language Models for Library Reference Services
topic Digital Libraries
Software Engineering
url https://arxiv.org/abs/2602.18935