Saved in:
| Main Authors: | Lewis, Martha, Mitchell, Melanie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.08955 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Robustness of Analogical Reasoning in Large Language Models
by: Lewis, Martha, et al.
Published: (2024)
by: Lewis, Martha, et al.
Published: (2024)
Counterfactual Simulatability of LLM Explanations for Generation Tasks
by: Limpijankit, Marvin, et al.
Published: (2025)
by: Limpijankit, Marvin, et al.
Published: (2025)
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
by: Wu, Zhaofeng, et al.
Published: (2023)
by: Wu, Zhaofeng, et al.
Published: (2023)
Natural Language Counterfactual Explanations for Graphs Using Large Language Models
by: Giorgi, Flavio, et al.
Published: (2024)
by: Giorgi, Flavio, et al.
Published: (2024)
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models
by: Yu, Junchi, et al.
Published: (2023)
by: Yu, Junchi, et al.
Published: (2023)
Modeling Understanding of Story-Based Analogies Using Large Language Models
by: Inani, Kalit, et al.
Published: (2025)
by: Inani, Kalit, et al.
Published: (2025)
Counterfactual Token Generation in Large Language Models
by: Chatzi, Ivi, et al.
Published: (2024)
by: Chatzi, Ivi, et al.
Published: (2024)
COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models
by: Fayyazi, Arya, et al.
Published: (2026)
by: Fayyazi, Arya, et al.
Published: (2026)
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
by: Han, Yunseok, et al.
Published: (2026)
by: Han, Yunseok, et al.
Published: (2026)
Reasoning Capabilities of Large Language Models on Dynamic Tasks
by: Wong, Annie, et al.
Published: (2025)
by: Wong, Annie, et al.
Published: (2025)
Aligning Large Language Models with Counterfactual DPO
by: Butcher, Bradley
Published: (2024)
by: Butcher, Bradley
Published: (2024)
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
by: Zhu, Kaijie, et al.
Published: (2023)
by: Zhu, Kaijie, et al.
Published: (2023)
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
by: Liu, Junlin, et al.
Published: (2026)
by: Liu, Junlin, et al.
Published: (2026)
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection
by: Yang, Bo, et al.
Published: (2025)
by: Yang, Bo, et al.
Published: (2025)
Eliciting Causal Abilities in Large Language Models for Reasoning Tasks
by: Wang, Yajing, et al.
Published: (2024)
by: Wang, Yajing, et al.
Published: (2024)
CLOMO: Counterfactual Logical Modification with Large Language Models
by: Huang, Yinya, et al.
Published: (2023)
by: Huang, Yinya, et al.
Published: (2023)
An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques
by: Aly, Walid Mohamed, et al.
Published: (2025)
by: Aly, Walid Mohamed, et al.
Published: (2025)
Generative Evaluation of Complex Reasoning in Large Language Models
by: Lin, Haowei, et al.
Published: (2025)
by: Lin, Haowei, et al.
Published: (2025)
MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning
by: Ghosh, Rajarshi, et al.
Published: (2025)
by: Ghosh, Rajarshi, et al.
Published: (2025)
Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments
by: Domnich, Marharyta, et al.
Published: (2024)
by: Domnich, Marharyta, et al.
Published: (2024)
The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning
by: Merrill, Scott, et al.
Published: (2026)
by: Merrill, Scott, et al.
Published: (2026)
Evaluating Ill-Defined Tasks in Large Language Models
by: Zhou, Yi, et al.
Published: (2026)
by: Zhou, Yi, et al.
Published: (2026)
Derivational Morphology Reveals Analogical Generalization in Large Language Models
by: Hofmann, Valentin, et al.
Published: (2024)
by: Hofmann, Valentin, et al.
Published: (2024)
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
by: Li, Yiyuan, et al.
Published: (2024)
by: Li, Yiyuan, et al.
Published: (2024)
Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study
by: Liu, Yinuo, et al.
Published: (2026)
by: Liu, Yinuo, et al.
Published: (2026)
Reasoning Elicitation in Language Models via Counterfactual Feedback
by: Hüyük, Alihan, et al.
Published: (2024)
by: Hüyük, Alihan, et al.
Published: (2024)
Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models
by: Feng, Yijun
Published: (2025)
by: Feng, Yijun
Published: (2025)
Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications
by: Malghan, Arjun R.
Published: (2025)
by: Malghan, Arjun R.
Published: (2025)
Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation
by: Ong, Keane, et al.
Published: (2025)
by: Ong, Keane, et al.
Published: (2025)
Evaluating Consistency and Reasoning Capabilities of Large Language Models
by: Saxena, Yash, et al.
Published: (2024)
by: Saxena, Yash, et al.
Published: (2024)
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
by: Kim, Hyunwoo, et al.
Published: (2025)
by: Kim, Hyunwoo, et al.
Published: (2025)
Evaluating Large Language Models for Real-World Engineering Tasks
by: Heesch, Rene, et al.
Published: (2025)
by: Heesch, Rene, et al.
Published: (2025)
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
by: Chen, Yanxi, et al.
Published: (2025)
by: Chen, Yanxi, et al.
Published: (2025)
Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?
by: Greatrix, Thomas, et al.
Published: (2024)
by: Greatrix, Thomas, et al.
Published: (2024)
Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
by: Alfugaha, Rawand, et al.
Published: (2025)
by: Alfugaha, Rawand, et al.
Published: (2025)
Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models
by: Nickel, Christian, et al.
Published: (2026)
by: Nickel, Christian, et al.
Published: (2026)
Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models
by: Kim, Jongho, et al.
Published: (2025)
by: Kim, Jongho, et al.
Published: (2025)
Using Contrastive Learning to Improve Two-Way Reasoning in Large Language Models: The Obfuscation Task as a Case Study
by: Nikiema, Serge Lionel, et al.
Published: (2025)
by: Nikiema, Serge Lionel, et al.
Published: (2025)
A Comprehensive Evaluation on Event Reasoning of Large Language Models
by: Tao, Zhengwei, et al.
Published: (2024)
by: Tao, Zhengwei, et al.
Published: (2024)
GLoRE: Evaluating Logical Reasoning of Large Language Models
by: liu, Hanmeng, et al.
Published: (2023)
by: liu, Hanmeng, et al.
Published: (2023)
Similar Items
-
Evaluating the Robustness of Analogical Reasoning in Large Language Models
by: Lewis, Martha, et al.
Published: (2024) -
Counterfactual Simulatability of LLM Explanations for Generation Tasks
by: Limpijankit, Marvin, et al.
Published: (2025) -
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
by: Wu, Zhaofeng, et al.
Published: (2023) -
Natural Language Counterfactual Explanations for Graphs Using Large Language Models
by: Giorgi, Flavio, et al.
Published: (2024) -
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models
by: Yu, Junchi, et al.
Published: (2023)