Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Miao, Chen, Kelly, Tanjim, Md Mehrab, Chunara, Rumi
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.09141
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917203240026112
author	Zhang, Miao Chen, Kelly Tanjim, Md Mehrab Chunara, Rumi
author_facet	Zhang, Miao Chen, Kelly Tanjim, Md Mehrab Chunara, Rumi
contents	Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in identity-dependent bias compared to vanilla prompting and a 45% reduction relative to prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_09141
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Identity-Robust Language Model Generation via Content Integrity Preservation Zhang, Miao Chen, Kelly Tanjim, Md Mehrab Chunara, Rumi Computation and Language Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in identity-dependent bias compared to vanilla prompting and a 45% reduction relative to prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
title	Identity-Robust Language Model Generation via Content Integrity Preservation
topic	Computation and Language
url	https://arxiv.org/abs/2601.09141

Similar Items