:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gong, Xilin, Yang, Shu, Cao, Zehua, Billard, Lynne, Wang, Di
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.00300
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)

FaithLM: Towards Faithful Explanations for Large Language Models
by: Chuang, Yu-Neng, et al.
Published: (2024)

Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models
by: Zhang, Jiayi, et al.
Published: (2025)

Mitigating the Bias of Large Language Model Evaluation
by: Zhou, Hongli, et al.
Published: (2024)

Investigating CoT Monitorability in Large Reasoning Models
by: Yang, Shu, et al.
Published: (2025)

Mitigating Large Language Model Hallucination with Faithful Finetuning
by: Hu, Minda, et al.
Published: (2024)

MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
by: Hu, Jingyu, et al.
Published: (2025)

Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models
by: Yang, Hao, et al.
Published: (2025)

Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models
by: Doi, Tomoki, et al.
Published: (2025)

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
by: Matton, Katie, et al.
Published: (2025)

Locating and Mitigating Gender Bias in Large Language Models
by: Cai, Yuchen, et al.
Published: (2024)

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
by: Agarwal, Chirag, et al.
Published: (2024)

Understanding and Mitigating Tokenization Bias in Language Models
by: Phan, Buu, et al.
Published: (2024)

Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
by: Zhang, Zhenliang, et al.
Published: (2025)

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
by: Siegel, Noah Y., et al.
Published: (2024)

Bias in Large Language Models: Origin, Evaluation, and Mitigation
by: Guo, Yufei, et al.
Published: (2024)

Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
by: Yeo, Wei Jie, et al.
Published: (2024)

Mitigating Label Length Bias in Large Language Models
by: Sanz-Guerrero, Mario, et al.
Published: (2025)

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
by: You, Liangliang, et al.
Published: (2025)

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG
by: Wang, Dan, et al.
Published: (2026)

Understanding the Repeat Curse in Large Language Models from a Feature Perspective
by: Yao, Junchi, et al.
Published: (2025)

NeuroFaith: Evaluating LLM Self-Explanation Faithfulness via Internal Representation Alignment
by: Bhan, Milan, et al.
Published: (2025)

Self-Critique and Refinement for Faithful Natural Language Explanations
by: Wang, Yingming, et al.
Published: (2025)

Illocutionary Explanation Planning for Source-Faithful Explanations in Retrieval-Augmented Language Models
by: Sovrano, Francesco, et al.
Published: (2026)

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
by: Zhang, Junbo, et al.
Published: (2025)

Detection, Classification, and Mitigation of Gender Bias in Large Language Models
by: Cheng, Xiaoqing, et al.
Published: (2025)

Do Multilingual Large Language Models Mitigate Stereotype Bias?
by: Nie, Shangrui, et al.
Published: (2024)

Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models
by: Wang, Yuqing, et al.
Published: (2024)

Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach
by: Huang, Tianyi, et al.
Published: (2024)

Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance
by: Alon, Bar, et al.
Published: (2026)

Towards Faithful Model Explanation in NLP: A Survey
by: Lyu, Qing, et al.
Published: (2022)

Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
by: Huang, Yanwen, et al.
Published: (2025)

Large Language Model Agents Are Not Always Faithful Self-Evolvers
by: Zhao, Weixiang, et al.
Published: (2026)

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
by: Luo, Linhao, et al.
Published: (2024)

Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
by: Xu, Yue, et al.
Published: (2025)

Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models
by: Tong, Schrasing, et al.
Published: (2024)

MBIAS: Mitigating Bias in Large Language Models While Retaining Context
by: Raza, Shaina, et al.
Published: (2024)

From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models
by: Xu, Zexing, et al.
Published: (2024)

Multi-Persona Thinking for Bias Mitigation in Large Language Models
by: Chen, Yuxing, et al.
Published: (2026)

Likelihood-based Mitigation of Evaluation Bias in Large Language Models
by: Oi, Masanari, et al.
Published: (2024)