Saved in:
| Main Authors: | Wang, Haofeng, Zhang, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.06899 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs
by: Zheng, Xiang, et al.
Published: (2026)
by: Zheng, Xiang, et al.
Published: (2026)
Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments
by: Cheng, Sitao, et al.
Published: (2024)
by: Cheng, Sitao, et al.
Published: (2024)
RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores
by: Li, Yingshu, et al.
Published: (2025)
by: Li, Yingshu, et al.
Published: (2025)
Dissociation of Faithful and Unfaithful Reasoning in LLMs
by: Yee, Evelyn, et al.
Published: (2024)
by: Yee, Evelyn, et al.
Published: (2024)
Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution
by: Sivakumaran, Nithin, et al.
Published: (2026)
by: Sivakumaran, Nithin, et al.
Published: (2026)
STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability
by: Wang, Guanghui, et al.
Published: (2025)
by: Wang, Guanghui, et al.
Published: (2025)
Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces
by: Pathak, Manas, et al.
Published: (2026)
by: Pathak, Manas, et al.
Published: (2026)
Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
by: Luo, Linhao, et al.
Published: (2023)
by: Luo, Linhao, et al.
Published: (2023)
C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning
by: Mittal, Avni, et al.
Published: (2026)
by: Mittal, Avni, et al.
Published: (2026)
Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method
by: Zhao, Tianzhe, et al.
Published: (2026)
by: Zhao, Tianzhe, et al.
Published: (2026)
FaithLens: Detecting and Explaining Faithfulness Hallucination
by: Si, Shuzheng, et al.
Published: (2025)
by: Si, Shuzheng, et al.
Published: (2025)
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
by: Feng, Di, et al.
Published: (2025)
by: Feng, Di, et al.
Published: (2025)
Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?
by: Young, Richard J.
Published: (2026)
by: Young, Richard J.
Published: (2026)
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
by: Jia, Boyu, et al.
Published: (2025)
by: Jia, Boyu, et al.
Published: (2025)
STORYSUMM: Evaluating Faithfulness in Story Summarization
by: Subbiah, Melanie, et al.
Published: (2024)
by: Subbiah, Melanie, et al.
Published: (2024)
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
by: Han, Yunseok, et al.
Published: (2026)
by: Han, Yunseok, et al.
Published: (2026)
Multimodal Reasoning with Multimodal Knowledge Graph
by: Lee, Junlin, et al.
Published: (2024)
by: Lee, Junlin, et al.
Published: (2024)
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models
by: Ju, Li, et al.
Published: (2026)
by: Ju, Li, et al.
Published: (2026)
SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models
by: Lv, Weijiang, et al.
Published: (2026)
by: Lv, Weijiang, et al.
Published: (2026)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)
by: Fayyaz, Mohsen, et al.
Published: (2024)
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization
by: Mei, Xiaoyong, et al.
Published: (2026)
by: Mei, Xiaoyong, et al.
Published: (2026)
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
by: Arcuschin, Iván, et al.
Published: (2025)
by: Arcuschin, Iván, et al.
Published: (2025)
From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition
by: Zhou, Yiqing, et al.
Published: (2025)
by: Zhou, Yiqing, et al.
Published: (2025)
Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process
by: Chen, Zhijun, et al.
Published: (2025)
by: Chen, Zhijun, et al.
Published: (2025)
Bridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning
by: Wang, Olivia Peiyu, et al.
Published: (2026)
by: Wang, Olivia Peiyu, et al.
Published: (2026)
CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation
by: Bandyopadhyay, Dibyanayan, et al.
Published: (2025)
by: Bandyopadhyay, Dibyanayan, et al.
Published: (2025)
GeoFaith: A Spatio-Temporal Dual View of Faithful Chain-of-Thought
by: Lv, Weijiang, et al.
Published: (2026)
by: Lv, Weijiang, et al.
Published: (2026)
FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering
by: Sui, Yuan, et al.
Published: (2024)
by: Sui, Yuan, et al.
Published: (2024)
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
by: Luo, Weiqing, et al.
Published: (2026)
by: Luo, Weiqing, et al.
Published: (2026)
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
by: Yuan, Wenhao, et al.
Published: (2026)
by: Yuan, Wenhao, et al.
Published: (2026)
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)
by: Adlakha, Vaibhav, et al.
Published: (2023)
FaithLM: Towards Faithful Explanations for Large Language Models
by: Chuang, Yu-Neng, et al.
Published: (2024)
by: Chuang, Yu-Neng, et al.
Published: (2024)
Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes
by: Jiao, Rui, et al.
Published: (2025)
by: Jiao, Rui, et al.
Published: (2025)
PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
by: Wang, Yunxiao, et al.
Published: (2025)
by: Wang, Yunxiao, et al.
Published: (2025)
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners
by: Gajjar, Jugal, et al.
Published: (2026)
by: Gajjar, Jugal, et al.
Published: (2026)
Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation
by: Sun, Zhouhao, et al.
Published: (2024)
by: Sun, Zhouhao, et al.
Published: (2024)
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
by: Jing, Xiaonan, et al.
Published: (2024)
by: Jing, Xiaonan, et al.
Published: (2024)
GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models
by: Sun, Zhouhao, et al.
Published: (2026)
by: Sun, Zhouhao, et al.
Published: (2026)
FLARE: Faithful Logic-Aided Reasoning and Exploration
by: Arakelyan, Erik, et al.
Published: (2024)
by: Arakelyan, Erik, et al.
Published: (2024)
Similar Items
-
Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs
by: Zheng, Xiang, et al.
Published: (2026) -
Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments
by: Cheng, Sitao, et al.
Published: (2024) -
RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores
by: Li, Yingshu, et al.
Published: (2025) -
Dissociation of Faithful and Unfaithful Reasoning in LLMs
by: Yee, Evelyn, et al.
Published: (2024) -
Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution
by: Sivakumaran, Nithin, et al.
Published: (2026)