Saved in:
Bibliographic Details
Main Author: Moreira, Benjamin Grando
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.24435
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918175770148864
author Moreira, Benjamin Grando
author_facet Moreira, Benjamin Grando
contents Evaluating reasoning ability in Large Language Models (LLMs) is important for advancing artificial intelligence, as it transcends mere linguistic task performance. It involves understanding whether these models truly understand information, perform inferences, and are able to draw conclusions in a logical and valid way. This study compare logical and abstract reasoning skills of several LLMs - including GPT, Claude, DeepSeek, Gemini, Grok, Llama, Mistral, Perplexity, and Sabiá - using a set of eight custom-designed reasoning questions. The LLM results are benchmarked against human performance on the same tasks, revealing significant differences and indicating areas where LLMs struggle with deduction.
format Preprint
id arxiv_https___arxiv_org_abs_2510_24435
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning
Moreira, Benjamin Grando
Artificial Intelligence
Evaluating reasoning ability in Large Language Models (LLMs) is important for advancing artificial intelligence, as it transcends mere linguistic task performance. It involves understanding whether these models truly understand information, perform inferences, and are able to draw conclusions in a logical and valid way. This study compare logical and abstract reasoning skills of several LLMs - including GPT, Claude, DeepSeek, Gemini, Grok, Llama, Mistral, Perplexity, and Sabiá - using a set of eight custom-designed reasoning questions. The LLM results are benchmarked against human performance on the same tasks, revealing significant differences and indicating areas where LLMs struggle with deduction.
title Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning
topic Artificial Intelligence
url https://arxiv.org/abs/2510.24435