:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Tang, Yuzhe
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2603.14761
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
by: Karakaş, Sercan
Published: (2026)

LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning
by: Li, Jiachun, et al.
Published: (2024)

Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
by: Wang, Yuqing, et al.
Published: (2023)

ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases
by: Do, Quyet V., et al.
Published: (2024)

From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI
by: Krause, Stefanie, et al.
Published: (2024)

Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
by: Li, Haoyang, et al.
Published: (2025)

Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?
by: Jayaweera, Chathuri, et al.
Published: (2025)

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
by: Zhou, Kaiwen, et al.
Published: (2023)

Com$^2$: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models
by: Xiong, Kai, et al.
Published: (2025)

GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning
by: Tang, Zhisheng, et al.
Published: (2024)

Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
by: Alfugaha, Rawand, et al.
Published: (2025)

Medical Reasoning with Large Language Models: A Survey and MR-Bench
by: Ren, Xiaohan, et al.
Published: (2026)

Detecting Emotional Incongruity of Sarcasm by Commonsense Reasoning
by: Qiu, Ziqi, et al.
Published: (2024)

CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
by: Chen, Kesheng, et al.
Published: (2026)

Benchmarking Chinese Commonsense Reasoning with a Multi-hop Reasoning Perspective
by: You, Wangjie, et al.
Published: (2025)

LOGICAL-COMMONSENSEQA: A Benchmark for Logical Commonsense Reasoning
by: Junias, Obed, et al.
Published: (2026)

Zero-shot Commonsense Reasoning over Machine Imagination
by: Park, Hyuntae, et al.
Published: (2024)

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models
by: Mouselinos, Spyridon, et al.
Published: (2024)

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
by: Parmar, Mihir, et al.
Published: (2024)

VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models
by: Dong, Nguyen Tien, et al.
Published: (2025)

TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
by: Chu, Zheng, et al.
Published: (2023)

Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs
by: Fang, Tianqing, et al.
Published: (2024)

Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
by: Palta, Shramay, et al.
Published: (2024)

mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
by: Sakai, Yusuke, et al.
Published: (2024)

Towards Quantifying Commonsense Reasoning with Mechanistic Insights
by: Joshi, Abhinav, et al.
Published: (2025)

ANAH: Analytical Annotation of Hallucinations in Large Language Models
by: Ji, Ziwei, et al.
Published: (2024)

UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
by: Xu, Xin, et al.
Published: (2025)

MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?
by: Yong, Xixian, et al.
Published: (2025)

Dissecting Failure Dynamics in Large Language Model Reasoning
by: Zhu, Wei, et al.
Published: (2026)

Agentic Reasoning for Large Language Models
by: Wei, Tianxin, et al.
Published: (2026)

OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)

LTLBench: Towards Benchmarks for Evaluating Temporal Reasoning in Large Language Models
by: Tang, Weizhi, et al.
Published: (2024)

Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning
by: Li, Jiachun, et al.
Published: (2024)

Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
by: Almheiri, Saeed, et al.
Published: (2025)

MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning
by: Nair, Inderjeet, et al.
Published: (2024)

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
by: Gu, Yuzhe, et al.
Published: (2024)

Can Language Models Take A Hint? Prompting for Controllable Contextualized Commonsense Inference
by: Colon-Hernandez, Pedro, et al.
Published: (2024)

Zero, Finite, and Infinite Belief History of Theory of Mind Reasoning in Large Language Models
by: Tang, Weizhi, et al.
Published: (2024)

Closing the Confidence-Faithfulness Gap in Large Language Models
by: Miao, Miranda Muqing, et al.
Published: (2026)

ACCORD: Closing the Commonsense Measurability Gap
by: Roewer-Després, François, et al.
Published: (2024)