Saved in:
| Main Authors: | Gong, Xilin, Yang, Shu, Cao, Zehua, Billard, Lynne, Wang, Di |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00300 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)
by: Ghandeharioun, Asma, et al.
Published: (2024)
FaithLM: Towards Faithful Explanations for Large Language Models
by: Chuang, Yu-Neng, et al.
Published: (2024)
by: Chuang, Yu-Neng, et al.
Published: (2024)
Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models
by: Zhang, Jiayi, et al.
Published: (2025)
by: Zhang, Jiayi, et al.
Published: (2025)
Mitigating the Bias of Large Language Model Evaluation
by: Zhou, Hongli, et al.
Published: (2024)
by: Zhou, Hongli, et al.
Published: (2024)
Investigating CoT Monitorability in Large Reasoning Models
by: Yang, Shu, et al.
Published: (2025)
by: Yang, Shu, et al.
Published: (2025)
Mitigating Large Language Model Hallucination with Faithful Finetuning
by: Hu, Minda, et al.
Published: (2024)
by: Hu, Minda, et al.
Published: (2024)
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
by: Hu, Jingyu, et al.
Published: (2025)
by: Hu, Jingyu, et al.
Published: (2025)
Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models
by: Yang, Hao, et al.
Published: (2025)
by: Yang, Hao, et al.
Published: (2025)
Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models
by: Doi, Tomoki, et al.
Published: (2025)
by: Doi, Tomoki, et al.
Published: (2025)
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
by: Matton, Katie, et al.
Published: (2025)
by: Matton, Katie, et al.
Published: (2025)
Locating and Mitigating Gender Bias in Large Language Models
by: Cai, Yuchen, et al.
Published: (2024)
by: Cai, Yuchen, et al.
Published: (2024)
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
by: Agarwal, Chirag, et al.
Published: (2024)
by: Agarwal, Chirag, et al.
Published: (2024)
Understanding and Mitigating Tokenization Bias in Language Models
by: Phan, Buu, et al.
Published: (2024)
by: Phan, Buu, et al.
Published: (2024)
Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
by: Zhang, Zhenliang, et al.
Published: (2025)
by: Zhang, Zhenliang, et al.
Published: (2025)
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
by: Siegel, Noah Y., et al.
Published: (2024)
by: Siegel, Noah Y., et al.
Published: (2024)
Bias in Large Language Models: Origin, Evaluation, and Mitigation
by: Guo, Yufei, et al.
Published: (2024)
by: Guo, Yufei, et al.
Published: (2024)
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
by: Yeo, Wei Jie, et al.
Published: (2024)
by: Yeo, Wei Jie, et al.
Published: (2024)
Mitigating Label Length Bias in Large Language Models
by: Sanz-Guerrero, Mario, et al.
Published: (2025)
by: Sanz-Guerrero, Mario, et al.
Published: (2025)
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
by: You, Liangliang, et al.
Published: (2025)
by: You, Liangliang, et al.
Published: (2025)
All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG
by: Wang, Dan, et al.
Published: (2026)
by: Wang, Dan, et al.
Published: (2026)
Understanding the Repeat Curse in Large Language Models from a Feature Perspective
by: Yao, Junchi, et al.
Published: (2025)
by: Yao, Junchi, et al.
Published: (2025)
NeuroFaith: Evaluating LLM Self-Explanation Faithfulness via Internal Representation Alignment
by: Bhan, Milan, et al.
Published: (2025)
by: Bhan, Milan, et al.
Published: (2025)
Self-Critique and Refinement for Faithful Natural Language Explanations
by: Wang, Yingming, et al.
Published: (2025)
by: Wang, Yingming, et al.
Published: (2025)
Illocutionary Explanation Planning for Source-Faithful Explanations in Retrieval-Augmented Language Models
by: Sovrano, Francesco, et al.
Published: (2026)
by: Sovrano, Francesco, et al.
Published: (2026)
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
by: Zhang, Junbo, et al.
Published: (2025)
by: Zhang, Junbo, et al.
Published: (2025)
Detection, Classification, and Mitigation of Gender Bias in Large Language Models
by: Cheng, Xiaoqing, et al.
Published: (2025)
by: Cheng, Xiaoqing, et al.
Published: (2025)
Do Multilingual Large Language Models Mitigate Stereotype Bias?
by: Nie, Shangrui, et al.
Published: (2024)
by: Nie, Shangrui, et al.
Published: (2024)
Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models
by: Wang, Yuqing, et al.
Published: (2024)
by: Wang, Yuqing, et al.
Published: (2024)
Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach
by: Huang, Tianyi, et al.
Published: (2024)
by: Huang, Tianyi, et al.
Published: (2024)
Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance
by: Alon, Bar, et al.
Published: (2026)
by: Alon, Bar, et al.
Published: (2026)
Towards Faithful Model Explanation in NLP: A Survey
by: Lyu, Qing, et al.
Published: (2022)
by: Lyu, Qing, et al.
Published: (2022)
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
by: Huang, Yanwen, et al.
Published: (2025)
by: Huang, Yanwen, et al.
Published: (2025)
Large Language Model Agents Are Not Always Faithful Self-Evolvers
by: Zhao, Weixiang, et al.
Published: (2026)
by: Zhao, Weixiang, et al.
Published: (2026)
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
by: Luo, Linhao, et al.
Published: (2024)
by: Luo, Linhao, et al.
Published: (2024)
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
by: Xu, Yue, et al.
Published: (2025)
by: Xu, Yue, et al.
Published: (2025)
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models
by: Tong, Schrasing, et al.
Published: (2024)
by: Tong, Schrasing, et al.
Published: (2024)
MBIAS: Mitigating Bias in Large Language Models While Retaining Context
by: Raza, Shaina, et al.
Published: (2024)
by: Raza, Shaina, et al.
Published: (2024)
From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models
by: Xu, Zexing, et al.
Published: (2024)
by: Xu, Zexing, et al.
Published: (2024)
Multi-Persona Thinking for Bias Mitigation in Large Language Models
by: Chen, Yuxing, et al.
Published: (2026)
by: Chen, Yuxing, et al.
Published: (2026)
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
by: Oi, Masanari, et al.
Published: (2024)
by: Oi, Masanari, et al.
Published: (2024)
Similar Items
-
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024) -
FaithLM: Towards Faithful Explanations for Large Language Models
by: Chuang, Yu-Neng, et al.
Published: (2024) -
Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models
by: Zhang, Jiayi, et al.
Published: (2025) -
Mitigating the Bias of Large Language Model Evaluation
by: Zhou, Hongli, et al.
Published: (2024) -
Investigating CoT Monitorability in Large Reasoning Models
by: Yang, Shu, et al.
Published: (2025)