:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Haofeng, Zhang, Yu
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.06899
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs
by: Zheng, Xiang, et al.
Published: (2026)

Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments
by: Cheng, Sitao, et al.
Published: (2024)

RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores
by: Li, Yingshu, et al.
Published: (2025)

Dissociation of Faithful and Unfaithful Reasoning in LLMs
by: Yee, Evelyn, et al.
Published: (2024)

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution
by: Sivakumaran, Nithin, et al.
Published: (2026)

STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability
by: Wang, Guanghui, et al.
Published: (2025)

Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces
by: Pathak, Manas, et al.
Published: (2026)

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
by: Luo, Linhao, et al.
Published: (2023)

C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning
by: Mittal, Avni, et al.
Published: (2026)

Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method
by: Zhao, Tianzhe, et al.
Published: (2026)

FaithLens: Detecting and Explaining Faithfulness Hallucination
by: Si, Shuzheng, et al.
Published: (2025)

SO-Bench: A Structural Output Evaluation of Multimodal LLMs
by: Feng, Di, et al.
Published: (2025)

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?
by: Young, Richard J.
Published: (2026)

Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
by: Jia, Boyu, et al.
Published: (2025)

STORYSUMM: Evaluating Faithfulness in Story Summarization
by: Subbiah, Melanie, et al.
Published: (2024)

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
by: Han, Yunseok, et al.
Published: (2026)

Multimodal Reasoning with Multimodal Knowledge Graph
by: Lee, Junlin, et al.
Published: (2024)

Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models
by: Ju, Li, et al.
Published: (2026)

SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models
by: Lv, Weijiang, et al.
Published: (2026)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)

Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization
by: Mei, Xiaoyong, et al.
Published: (2026)

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
by: Arcuschin, Iván, et al.
Published: (2025)

From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition
by: Zhou, Yiqing, et al.
Published: (2025)

Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process
by: Chen, Zhijun, et al.
Published: (2025)

Bridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning
by: Wang, Olivia Peiyu, et al.
Published: (2026)

CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation
by: Bandyopadhyay, Dibyanayan, et al.
Published: (2025)

GeoFaith: A Spatio-Temporal Dual View of Faithful Chain-of-Thought
by: Lv, Weijiang, et al.
Published: (2026)

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering
by: Sui, Yuan, et al.
Published: (2024)

Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
by: Luo, Weiqing, et al.
Published: (2026)

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
by: Yuan, Wenhao, et al.
Published: (2026)

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)

FaithLM: Towards Faithful Explanations for Large Language Models
by: Chuang, Yu-Neng, et al.
Published: (2024)

Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes
by: Jiao, Rui, et al.
Published: (2025)

PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
by: Wang, Yunxiao, et al.
Published: (2025)

RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners
by: Gajjar, Jugal, et al.
Published: (2026)

Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation
by: Sun, Zhouhao, et al.
Published: (2024)

On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
by: Jing, Xiaonan, et al.
Published: (2024)

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models
by: Sun, Zhouhao, et al.
Published: (2026)

FLARE: Faithful Logic-Aided Reasoning and Exploration
by: Arakelyan, Erik, et al.
Published: (2024)