Saved in:
| Main Authors: | Cohn, Anthony G, Blackwell, Robert E |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.19589 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Ability of Large Language Models to Reason about Cardinal Directions
by: Cohn, Anthony G, et al.
Published: (2024)
by: Cohn, Anthony G, et al.
Published: (2024)
Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited
by: Cohn, Anthony G, et al.
Published: (2025)
by: Cohn, Anthony G, et al.
Published: (2025)
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
by: Blackwell, Robert E., et al.
Published: (2024)
by: Blackwell, Robert E., et al.
Published: (2024)
QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi
by: Cohn, Anthony G., et al.
Published: (2026)
by: Cohn, Anthony G., et al.
Published: (2026)
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
by: Li, Fangjun, et al.
Published: (2024)
by: Li, Fangjun, et al.
Published: (2024)
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
by: Li, Fangjun, et al.
Published: (2024)
by: Li, Fangjun, et al.
Published: (2024)
Can Large Language Models Generalize Procedures Across Representations?
by: Lin, Fangru, et al.
Published: (2026)
by: Lin, Fangru, et al.
Published: (2026)
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
by: Lin, Fangru, et al.
Published: (2024)
by: Lin, Fangru, et al.
Published: (2024)
Can Language Models Reason about Individualistic Human Values and Preferences?
by: Jiang, Liwei, et al.
Published: (2024)
by: Jiang, Liwei, et al.
Published: (2024)
Large Reasoning Models Struggle to Transfer Parametric Knowledge Across Scripts
by: Bandarkar, Lucas, et al.
Published: (2026)
by: Bandarkar, Lucas, et al.
Published: (2026)
Code Simulation Challenges for Large Language Models
by: La Malfa, Emanuele, et al.
Published: (2024)
by: La Malfa, Emanuele, et al.
Published: (2024)
Large Language Models Can Learn Temporal Reasoning
by: Xiong, Siheng, et al.
Published: (2024)
by: Xiong, Siheng, et al.
Published: (2024)
BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
by: Du, Weihong, et al.
Published: (2025)
by: Du, Weihong, et al.
Published: (2025)
Performance Comparison of Large Language Models on Advanced Calculus Problems
by: Moon, In Hak
Published: (2025)
by: Moon, In Hak
Published: (2025)
NExT: Teaching Large Language Models to Reason about Code Execution
by: Ni, Ansong, et al.
Published: (2024)
by: Ni, Ansong, et al.
Published: (2024)
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
by: Wang, Jingyuan, et al.
Published: (2025)
by: Wang, Jingyuan, et al.
Published: (2025)
An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning
by: Kaur, Navdeep, et al.
Published: (2025)
by: Kaur, Navdeep, et al.
Published: (2025)
Can Large Language Models do Analytical Reasoning?
by: Hu, Yebowen, et al.
Published: (2024)
by: Hu, Yebowen, et al.
Published: (2024)
Can Large Language Models Reason and Plan?
by: Kambhampati, Subbarao
Published: (2024)
by: Kambhampati, Subbarao
Published: (2024)
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?
by: Ma, Chengqian, et al.
Published: (2024)
by: Ma, Chengqian, et al.
Published: (2024)
Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?
by: Zhang, Yanjian, et al.
Published: (2025)
by: Zhang, Yanjian, et al.
Published: (2025)
Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
by: Haris, Erum, et al.
Published: (2024)
by: Haris, Erum, et al.
Published: (2024)
Break the Chain: Large Language Models Can be Shortcut Reasoners
by: Ding, Mengru, et al.
Published: (2024)
by: Ding, Mengru, et al.
Published: (2024)
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
by: Zhu, Junda, et al.
Published: (2025)
by: Zhu, Junda, et al.
Published: (2025)
Benchmarking Large Language Models for Calculus Problem-Solving: A Comparative Analysis
by: Moon, In Hak
Published: (2025)
by: Moon, In Hak
Published: (2025)
Can Large Language Models Act as Symbolic Reasoners?
by: Sullivan, Rob, et al.
Published: (2024)
by: Sullivan, Rob, et al.
Published: (2024)
What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models
by: Yun, Tian, et al.
Published: (2025)
by: Yun, Tian, et al.
Published: (2025)
Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models
by: Jiang, Song, et al.
Published: (2023)
by: Jiang, Song, et al.
Published: (2023)
Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?
by: Su, Zhaochen, et al.
Published: (2024)
by: Su, Zhaochen, et al.
Published: (2024)
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
by: He, Yancheng, et al.
Published: (2025)
by: He, Yancheng, et al.
Published: (2025)
Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses
by: Han, Bin, et al.
Published: (2025)
by: Han, Bin, et al.
Published: (2025)
LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning about Actions
by: Ishay, Adam, et al.
Published: (2025)
by: Ishay, Adam, et al.
Published: (2025)
Reasoning Can Hurt the Inductive Abilities of Large Language Models
by: Jin, Haibo, et al.
Published: (2025)
by: Jin, Haibo, et al.
Published: (2025)
Large Language Models Can Self-Improve in Long-context Reasoning
by: Li, Siheng, et al.
Published: (2024)
by: Li, Siheng, et al.
Published: (2024)
SMILE-Next: Teaching Large Language Models to Detect, Classify, and Reason about Laughter
by: Jung-Mok, Lee, et al.
Published: (2026)
by: Jung-Mok, Lee, et al.
Published: (2026)
Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models
by: Klisura, Đorđe, et al.
Published: (2025)
by: Klisura, Đorđe, et al.
Published: (2025)
BAT: Learning to Reason about Spatial Sounds with Large Language Models
by: Zheng, Zhisheng, et al.
Published: (2024)
by: Zheng, Zhisheng, et al.
Published: (2024)
Reasonable Space for the $λ$-Calculus, Logarithmically
by: Accattoli, Beniamino, et al.
Published: (2022)
by: Accattoli, Beniamino, et al.
Published: (2022)
PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes
by: Diallo, Aissatou, et al.
Published: (2024)
by: Diallo, Aissatou, et al.
Published: (2024)
Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?
by: Greatrix, Thomas, et al.
Published: (2024)
by: Greatrix, Thomas, et al.
Published: (2024)
Similar Items
-
Evaluating the Ability of Large Language Models to Reason about Cardinal Directions
by: Cohn, Anthony G, et al.
Published: (2024) -
Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited
by: Cohn, Anthony G, et al.
Published: (2025) -
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
by: Blackwell, Robert E., et al.
Published: (2024) -
QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi
by: Cohn, Anthony G., et al.
Published: (2026) -
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
by: Li, Fangjun, et al.
Published: (2024)