Saved in:
Bibliographic Details
Main Authors: Mirza, Vishal, Kulkarni, Rahul, Jadhav, Aakanksha
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.14583
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910270528421888
author Mirza, Vishal
Kulkarni, Rahul
Jadhav, Aakanksha
author_facet Mirza, Vishal
Kulkarni, Rahul
Jadhav, Aakanksha
contents LLM bias evaluation is critical as large language models (LLMs) increasingly influence high-stakes decisions. This paper provides a comprehensive assessment of gender, racial, and age disparities in leading LLMs, revealing that debiasing efforts often create new fairness trade-offs. Recent advancements in LLMs have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs - a crucial issue affecting their usability, reliability, and fairness. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. Critically, we observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating disparities - a "debiasing paradox" that highlights the limitations of current bias mitigation techniques and underscores the need for more effective approaches.
format Preprint
id arxiv_https___arxiv_org_abs_2409_14583
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios
Mirza, Vishal
Kulkarni, Rahul
Jadhav, Aakanksha
Artificial Intelligence
LLM bias evaluation is critical as large language models (LLMs) increasingly influence high-stakes decisions. This paper provides a comprehensive assessment of gender, racial, and age disparities in leading LLMs, revealing that debiasing efforts often create new fairness trade-offs. Recent advancements in LLMs have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs - a crucial issue affecting their usability, reliability, and fairness. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. Critically, we observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating disparities - a "debiasing paradox" that highlights the limitations of current bias mitigation techniques and underscores the need for more effective approaches.
title LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios
topic Artificial Intelligence
url https://arxiv.org/abs/2409.14583