Saved in:
| Main Author: | Tang, Yuzhe |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14761 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
by: Karakaş, Sercan
Published: (2026)
by: Karakaş, Sercan
Published: (2026)
LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning
by: Li, Jiachun, et al.
Published: (2024)
by: Li, Jiachun, et al.
Published: (2024)
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
by: Wang, Yuqing, et al.
Published: (2023)
by: Wang, Yuqing, et al.
Published: (2023)
ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases
by: Do, Quyet V., et al.
Published: (2024)
by: Do, Quyet V., et al.
Published: (2024)
From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI
by: Krause, Stefanie, et al.
Published: (2024)
by: Krause, Stefanie, et al.
Published: (2024)
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?
by: Jayaweera, Chathuri, et al.
Published: (2025)
by: Jayaweera, Chathuri, et al.
Published: (2025)
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
by: Zhou, Kaiwen, et al.
Published: (2023)
by: Zhou, Kaiwen, et al.
Published: (2023)
Com$^2$: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models
by: Xiong, Kai, et al.
Published: (2025)
by: Xiong, Kai, et al.
Published: (2025)
GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning
by: Tang, Zhisheng, et al.
Published: (2024)
by: Tang, Zhisheng, et al.
Published: (2024)
Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
by: Alfugaha, Rawand, et al.
Published: (2025)
by: Alfugaha, Rawand, et al.
Published: (2025)
Medical Reasoning with Large Language Models: A Survey and MR-Bench
by: Ren, Xiaohan, et al.
Published: (2026)
by: Ren, Xiaohan, et al.
Published: (2026)
Detecting Emotional Incongruity of Sarcasm by Commonsense Reasoning
by: Qiu, Ziqi, et al.
Published: (2024)
by: Qiu, Ziqi, et al.
Published: (2024)
CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
by: Chen, Kesheng, et al.
Published: (2026)
by: Chen, Kesheng, et al.
Published: (2026)
Benchmarking Chinese Commonsense Reasoning with a Multi-hop Reasoning Perspective
by: You, Wangjie, et al.
Published: (2025)
by: You, Wangjie, et al.
Published: (2025)
LOGICAL-COMMONSENSEQA: A Benchmark for Logical Commonsense Reasoning
by: Junias, Obed, et al.
Published: (2026)
by: Junias, Obed, et al.
Published: (2026)
Zero-shot Commonsense Reasoning over Machine Imagination
by: Park, Hyuntae, et al.
Published: (2024)
by: Park, Hyuntae, et al.
Published: (2024)
Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models
by: Mouselinos, Spyridon, et al.
Published: (2024)
by: Mouselinos, Spyridon, et al.
Published: (2024)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
by: Parmar, Mihir, et al.
Published: (2024)
by: Parmar, Mihir, et al.
Published: (2024)
VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models
by: Dong, Nguyen Tien, et al.
Published: (2025)
by: Dong, Nguyen Tien, et al.
Published: (2025)
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
by: Chu, Zheng, et al.
Published: (2023)
by: Chu, Zheng, et al.
Published: (2023)
Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs
by: Fang, Tianqing, et al.
Published: (2024)
by: Fang, Tianqing, et al.
Published: (2024)
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
by: Palta, Shramay, et al.
Published: (2024)
by: Palta, Shramay, et al.
Published: (2024)
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
by: Sakai, Yusuke, et al.
Published: (2024)
by: Sakai, Yusuke, et al.
Published: (2024)
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
by: Joshi, Abhinav, et al.
Published: (2025)
by: Joshi, Abhinav, et al.
Published: (2025)
ANAH: Analytical Annotation of Hallucinations in Large Language Models
by: Ji, Ziwei, et al.
Published: (2024)
by: Ji, Ziwei, et al.
Published: (2024)
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?
by: Yong, Xixian, et al.
Published: (2025)
by: Yong, Xixian, et al.
Published: (2025)
Dissecting Failure Dynamics in Large Language Model Reasoning
by: Zhu, Wei, et al.
Published: (2026)
by: Zhu, Wei, et al.
Published: (2026)
Agentic Reasoning for Large Language Models
by: Wei, Tianxin, et al.
Published: (2026)
by: Wei, Tianxin, et al.
Published: (2026)
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
LTLBench: Towards Benchmarks for Evaluating Temporal Reasoning in Large Language Models
by: Tang, Weizhi, et al.
Published: (2024)
by: Tang, Weizhi, et al.
Published: (2024)
Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning
by: Li, Jiachun, et al.
Published: (2024)
by: Li, Jiachun, et al.
Published: (2024)
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
by: Almheiri, Saeed, et al.
Published: (2025)
by: Almheiri, Saeed, et al.
Published: (2025)
MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning
by: Nair, Inderjeet, et al.
Published: (2024)
by: Nair, Inderjeet, et al.
Published: (2024)
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
by: Gu, Yuzhe, et al.
Published: (2024)
by: Gu, Yuzhe, et al.
Published: (2024)
Can Language Models Take A Hint? Prompting for Controllable Contextualized Commonsense Inference
by: Colon-Hernandez, Pedro, et al.
Published: (2024)
by: Colon-Hernandez, Pedro, et al.
Published: (2024)
Zero, Finite, and Infinite Belief History of Theory of Mind Reasoning in Large Language Models
by: Tang, Weizhi, et al.
Published: (2024)
by: Tang, Weizhi, et al.
Published: (2024)
Closing the Confidence-Faithfulness Gap in Large Language Models
by: Miao, Miranda Muqing, et al.
Published: (2026)
by: Miao, Miranda Muqing, et al.
Published: (2026)
ACCORD: Closing the Commonsense Measurability Gap
by: Roewer-Després, François, et al.
Published: (2024)
by: Roewer-Després, François, et al.
Published: (2024)
Similar Items
-
Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
by: Karakaş, Sercan
Published: (2026) -
LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning
by: Li, Jiachun, et al.
Published: (2024) -
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
by: Wang, Yuqing, et al.
Published: (2023) -
ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases
by: Do, Quyet V., et al.
Published: (2024) -
From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI
by: Krause, Stefanie, et al.
Published: (2024)