Saved in:
| Main Authors: | Liu, Guangliang, Chen, Bocheng, Zi, Han, Zhang, Xitong, Johnson, Kristen Marie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.21456 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning to Diagnose and Correct Errors: Towards Moral Sensitivity Acquisition in Large Language Models
by: Chen, Bocheng, et al.
Published: (2026)
by: Chen, Bocheng, et al.
Published: (2026)
Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Discourse Heuristics For Paradoxically Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Smaller Large Language Models Can Do Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
On the Convergence of Moral Self-Correction in Large Language Models
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Self-correction is Not An Innate Capability in Language Models
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Can Large Language Models Handle Discourse Particles? A Case Study of Colloquial Malay
by: Yusoff, Mariah Al Giptiah Binte, et al.
Published: (2026)
by: Yusoff, Mariah Al Giptiah Binte, et al.
Published: (2026)
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Challenging Negative Gender Stereotypes: A Study on the Effectiveness of Automated Counter-Stereotypes
by: Nejadgholi, Isar, et al.
Published: (2024)
by: Nejadgholi, Isar, et al.
Published: (2024)
Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian
by: Kalhor, Ghazal, et al.
Published: (2025)
by: Kalhor, Ghazal, et al.
Published: (2025)
A Survey to Recent Progress Towards Understanding In-Context Learning
by: Mao, Haitao, et al.
Published: (2024)
by: Mao, Haitao, et al.
Published: (2024)
Local Contrastive Editing of Gender Stereotypes
by: Lutz, Marlene, et al.
Published: (2024)
by: Lutz, Marlene, et al.
Published: (2024)
Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems
by: Sankaran, Aditya Narayan, et al.
Published: (2024)
by: Sankaran, Aditya Narayan, et al.
Published: (2024)
LLMs Reproduce Stereotypes of Sexual and Gender Minorities
by: Ostrow, Ruby, et al.
Published: (2025)
by: Ostrow, Ruby, et al.
Published: (2025)
No Free Lunch for Defending Against Prefilling Attack by In-Context Learning
by: Xue, Zhiyu, et al.
Published: (2024)
by: Xue, Zhiyu, et al.
Published: (2024)
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case
by: Giachino, Gioele, et al.
Published: (2025)
by: Giachino, Gioele, et al.
Published: (2025)
Gender-Neutral Rewriting in Italian: Models, Approaches, and Trade-offs
by: Piergentili, Andrea, et al.
Published: (2025)
by: Piergentili, Andrea, et al.
Published: (2025)
Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models
by: Parihar, Shweta, et al.
Published: (2026)
by: Parihar, Shweta, et al.
Published: (2026)
An Empirical Study of Gendered Stereotypes in Emotional Attributes for Bangla in Multilingual Large Language Models
by: Sadhu, Jayanta, et al.
Published: (2024)
by: Sadhu, Jayanta, et al.
Published: (2024)
On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging
by: Luo, Haozhe, et al.
Published: (2025)
by: Luo, Haozhe, et al.
Published: (2025)
Alignment Reduces Expressed but Not Encoded Gender Bias: A Unified Framework and Study
by: Bouchouchi, Nour, et al.
Published: (2026)
by: Bouchouchi, Nour, et al.
Published: (2026)
Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes
by: Jeoung, Sullam, et al.
Published: (2025)
by: Jeoung, Sullam, et al.
Published: (2025)
Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark
by: Liu, Yangyang, et al.
Published: (2026)
by: Liu, Yangyang, et al.
Published: (2026)
Evaluation of Large Language Models: STEM education and Gender Stereotypes
by: Due, Smilla, et al.
Published: (2024)
by: Due, Smilla, et al.
Published: (2024)
More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models
by: Chen, Evan, et al.
Published: (2025)
by: Chen, Evan, et al.
Published: (2025)
Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling
by: Pikuliak, Matúš, et al.
Published: (2023)
by: Pikuliak, Matúš, et al.
Published: (2023)
Quantifying Stereotypes in Language
by: Liu, Yang
Published: (2024)
by: Liu, Yang
Published: (2024)
The Unintended Trade-off of AI Alignment:Balancing Hallucination Mitigation and Safety in LLMs
by: Mahmoud, Omar, et al.
Published: (2025)
by: Mahmoud, Omar, et al.
Published: (2025)
Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes
by: Masoudian, Shahed, et al.
Published: (2025)
by: Masoudian, Shahed, et al.
Published: (2025)
Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
by: Zakizadeh, Mahdi, et al.
Published: (2025)
by: Zakizadeh, Mahdi, et al.
Published: (2025)
Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments
by: Kumar, Divyanshu, et al.
Published: (2026)
by: Kumar, Divyanshu, et al.
Published: (2026)
Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources
by: Hansen, Lasse Hyldig, et al.
Published: (2024)
by: Hansen, Lasse Hyldig, et al.
Published: (2024)
Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health
by: Ngo, Trung Hieu, et al.
Published: (2026)
by: Ngo, Trung Hieu, et al.
Published: (2026)
Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution
by: Plaza-del-Arco, Flor Miriam, et al.
Published: (2024)
by: Plaza-del-Arco, Flor Miriam, et al.
Published: (2024)
Graph Neural Networks for Misinformation Detection: Performance-Efficiency Trade-offs
by: Kuntur, Soveatin, et al.
Published: (2026)
by: Kuntur, Soveatin, et al.
Published: (2026)
The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data
by: Luca, Massimiliano, et al.
Published: (2025)
by: Luca, Massimiliano, et al.
Published: (2025)
Histoires Morales: A French Dataset for Assessing Moral Alignment
by: Leteno, Thibaud, et al.
Published: (2025)
by: Leteno, Thibaud, et al.
Published: (2025)
Similar Items
-
Learning to Diagnose and Correct Errors: Towards Moral Sensitivity Acquisition in Large Language Models
by: Chen, Bocheng, et al.
Published: (2026) -
Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links
by: Liu, Guangliang, et al.
Published: (2025) -
Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
by: Liu, Guangliang, et al.
Published: (2025) -
Discourse Heuristics For Paradoxically Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2025) -
Smaller Large Language Models Can Do Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2024)