:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Guangliang, Chen, Bocheng, Zi, Han, Zhang, Xitong, Johnson, Kristen Marie
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2509.21456
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning to Diagnose and Correct Errors: Towards Moral Sensitivity Acquisition in Large Language Models
by: Chen, Bocheng, et al.
Published: (2026)

Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links
by: Liu, Guangliang, et al.
Published: (2025)

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
by: Liu, Guangliang, et al.
Published: (2025)

Discourse Heuristics For Paradoxically Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2025)

Smaller Large Language Models Can Do Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2024)

On the Convergence of Moral Self-Correction in Large Language Models
by: Liu, Guangliang, et al.
Published: (2025)

Self-correction is Not An Innate Capability in Language Models
by: Liu, Guangliang, et al.
Published: (2024)

Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
by: Liu, Guangliang, et al.
Published: (2024)

Can Large Language Models Handle Discourse Particles? A Case Study of Colloquial Malay
by: Yusoff, Mariah Al Giptiah Binte, et al.
Published: (2026)

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
by: Liu, Guangliang, et al.
Published: (2024)

Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
by: Liu, Guangliang, et al.
Published: (2024)

Challenging Negative Gender Stereotypes: A Study on the Effectiveness of Automated Counter-Stereotypes
by: Nejadgholi, Isar, et al.
Published: (2024)

Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian
by: Kalhor, Ghazal, et al.
Published: (2025)

A Survey to Recent Progress Towards Understanding In-Context Learning
by: Mao, Haitao, et al.
Published: (2024)

Local Contrastive Editing of Gender Stereotypes
by: Lutz, Marlene, et al.
Published: (2024)

Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems
by: Sankaran, Aditya Narayan, et al.
Published: (2024)

LLMs Reproduce Stereotypes of Sexual and Gender Minorities
by: Ostrow, Ruby, et al.
Published: (2025)

No Free Lunch for Defending Against Prefilling Attack by In-Context Learning
by: Xue, Zhiyu, et al.
Published: (2024)

An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case
by: Giachino, Gioele, et al.
Published: (2025)

Gender-Neutral Rewriting in Italian: Models, Approaches, and Trade-offs
by: Piergentili, Andrea, et al.
Published: (2025)

Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models
by: Parihar, Shweta, et al.
Published: (2026)

An Empirical Study of Gendered Stereotypes in Emotional Attributes for Bangla in Multilingual Large Language Models
by: Sadhu, Jayanta, et al.
Published: (2024)

On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging
by: Luo, Haozhe, et al.
Published: (2025)

Alignment Reduces Expressed but Not Encoded Gender Bias: A Unified Framework and Study
by: Bouchouchi, Nour, et al.
Published: (2026)

Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes
by: Jeoung, Sullam, et al.
Published: (2025)

Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark
by: Liu, Yangyang, et al.
Published: (2026)

Evaluation of Large Language Models: STEM education and Gender Stereotypes
by: Due, Smilla, et al.
Published: (2024)

More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models
by: Chen, Evan, et al.
Published: (2025)

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling
by: Pikuliak, Matúš, et al.
Published: (2023)

Quantifying Stereotypes in Language
by: Liu, Yang
Published: (2024)

The Unintended Trade-off of AI Alignment:Balancing Hallucination Mitigation and Safety in LLMs
by: Mahmoud, Omar, et al.
Published: (2025)

Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes
by: Masoudian, Shahed, et al.
Published: (2025)

Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
by: Zakizadeh, Mahdi, et al.
Published: (2025)

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments
by: Kumar, Divyanshu, et al.
Published: (2026)

Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources
by: Hansen, Lasse Hyldig, et al.
Published: (2024)

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health
by: Ngo, Trung Hieu, et al.
Published: (2026)

Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution
by: Plaza-del-Arco, Flor Miriam, et al.
Published: (2024)

Graph Neural Networks for Misinformation Detection: Performance-Efficiency Trade-offs
by: Kuntur, Soveatin, et al.
Published: (2026)

The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data
by: Luca, Massimiliano, et al.
Published: (2025)

Histoires Morales: A French Dataset for Assessing Moral Alignment
by: Leteno, Thibaud, et al.
Published: (2025)