Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mirza, Vishal, Kulkarni, Rahul, Jadhav, Aakanksha
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.14583
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910270528421888
author	Mirza, Vishal Kulkarni, Rahul Jadhav, Aakanksha
author_facet	Mirza, Vishal Kulkarni, Rahul Jadhav, Aakanksha
contents	LLM bias evaluation is critical as large language models (LLMs) increasingly influence high-stakes decisions. This paper provides a comprehensive assessment of gender, racial, and age disparities in leading LLMs, revealing that debiasing efforts often create new fairness trade-offs. Recent advancements in LLMs have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs - a crucial issue affecting their usability, reliability, and fairness. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. Critically, we observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating disparities - a "debiasing paradox" that highlights the limitations of current bias mitigation techniques and underscores the need for more effective approaches.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_14583
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios Mirza, Vishal Kulkarni, Rahul Jadhav, Aakanksha Artificial Intelligence LLM bias evaluation is critical as large language models (LLMs) increasingly influence high-stakes decisions. This paper provides a comprehensive assessment of gender, racial, and age disparities in leading LLMs, revealing that debiasing efforts often create new fairness trade-offs. Recent advancements in LLMs have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs - a crucial issue affecting their usability, reliability, and fairness. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. Critically, we observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating disparities - a "debiasing paradox" that highlights the limitations of current bias mitigation techniques and underscores the need for more effective approaches.
title	LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios
topic	Artificial Intelligence
url	https://arxiv.org/abs/2409.14583

Similar Items