Saved in:
| Main Authors: | Mondorf, Philipp, Zhou, Shijia, Riedler, Monica, Plank, Barbara |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.01445 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning
by: Mondorf, Philipp, et al.
Published: (2024)
by: Mondorf, Philipp, et al.
Published: (2024)
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
by: Mondorf, Philipp, et al.
Published: (2024)
by: Mondorf, Philipp, et al.
Published: (2024)
LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models
by: Rabern, Brian, et al.
Published: (2026)
by: Rabern, Brian, et al.
Published: (2026)
The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It
by: Bertolazzi, Leonardo, et al.
Published: (2025)
by: Bertolazzi, Leonardo, et al.
Published: (2025)
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models
by: Mondorf, Philipp, et al.
Published: (2024)
by: Mondorf, Philipp, et al.
Published: (2024)
Tracing Uncertainty in Language Model "Reasoning"
by: Grünefeld, Nils, et al.
Published: (2026)
by: Grünefeld, Nils, et al.
Published: (2026)
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications
by: Riedler, Monica, et al.
Published: (2024)
by: Riedler, Monica, et al.
Published: (2024)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
by: Mondorf, Philipp, et al.
Published: (2024)
by: Mondorf, Philipp, et al.
Published: (2024)
CausalARC: Abstract Reasoning with Causal World Models
by: Maasch, Jacqueline, et al.
Published: (2025)
by: Maasch, Jacqueline, et al.
Published: (2025)
GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning
by: Peltonen, Saku, et al.
Published: (2026)
by: Peltonen, Saku, et al.
Published: (2026)
Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models
by: Cheng, Xinyuan, et al.
Published: (2026)
by: Cheng, Xinyuan, et al.
Published: (2026)
If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
by: Orth, Jasmin, et al.
Published: (2025)
by: Orth, Jasmin, et al.
Published: (2025)
ARC-TGI: Human-Validated Task Generators with Reasoning Chain Templates for ARC-AGI
by: Lehmann, Jens, et al.
Published: (2026)
by: Lehmann, Jens, et al.
Published: (2026)
From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark
by: Lei, Chao, et al.
Published: (2025)
by: Lei, Chao, et al.
Published: (2025)
System 2 Reasoning for Human-AI Alignment: Generality and Adaptivity via ARC-AGI
by: Kim, Sejin, et al.
Published: (2024)
by: Kim, Sejin, et al.
Published: (2024)
ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus
by: Moffitt, Michael D.
Published: (2025)
by: Moffitt, Michael D.
Published: (2025)
Reason to Rote: Rethinking Memorization in Reasoning
by: Du, Yupei, et al.
Published: (2025)
by: Du, Yupei, et al.
Published: (2025)
Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks
by: Rudakov, Evgenii, et al.
Published: (2025)
by: Rudakov, Evgenii, et al.
Published: (2025)
ARC: Leveraging Compositional Representations for Cross-Problem Learning on VRPs
by: Jeong, Han-Seul, et al.
Published: (2025)
by: Jeong, Han-Seul, et al.
Published: (2025)
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
by: Wang, Xinpeng, et al.
Published: (2025)
by: Wang, Xinpeng, et al.
Published: (2025)
HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation
by: Zhang, Jiajun, et al.
Published: (2025)
by: Zhang, Jiajun, et al.
Published: (2025)
ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems
by: Chollet, Francois, et al.
Published: (2025)
by: Chollet, Francois, et al.
Published: (2025)
ARC-NCA: Towards Developmental Solutions to the Abstraction and Reasoning Corpus
by: Guichard, Etienne, et al.
Published: (2025)
by: Guichard, Etienne, et al.
Published: (2025)
LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic
by: Kalyanpur, Aditya, et al.
Published: (2024)
by: Kalyanpur, Aditya, et al.
Published: (2024)
Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning
by: Liu, Yijun, et al.
Published: (2025)
by: Liu, Yijun, et al.
Published: (2025)
H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark
by: LeGris, Solim, et al.
Published: (2024)
by: LeGris, Solim, et al.
Published: (2024)
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
by: Romeo, Carlo, et al.
Published: (2026)
by: Romeo, Carlo, et al.
Published: (2026)
Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task
by: Wu, Junjie, et al.
Published: (2025)
by: Wu, Junjie, et al.
Published: (2025)
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
by: Siedler, Philipp D.
Published: (2025)
by: Siedler, Philipp D.
Published: (2025)
ARC Prize 2025: Technical Report
by: Chollet, François, et al.
Published: (2026)
by: Chollet, François, et al.
Published: (2026)
ARC Prize 2024: Technical Report
by: Chollet, Francois, et al.
Published: (2024)
by: Chollet, Francois, et al.
Published: (2024)
Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation
by: Baan, Joris, et al.
Published: (2026)
by: Baan, Joris, et al.
Published: (2026)
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory
by: Ho, Matthew, et al.
Published: (2025)
by: Ho, Matthew, et al.
Published: (2025)
Think Visually, Reason Textually: Vision-Language Synergy in ARC
by: Zhang, Beichen, et al.
Published: (2025)
by: Zhang, Beichen, et al.
Published: (2025)
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
by: Chen, Qiqi, et al.
Published: (2024)
by: Chen, Qiqi, et al.
Published: (2024)
An Empirical Comparison of Generative Approaches for Product Attribute-Value Identification
by: Sabeh, Kassem, et al.
Published: (2024)
by: Sabeh, Kassem, et al.
Published: (2024)
Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations
by: Khandalkar, Nikhil, et al.
Published: (2025)
by: Khandalkar, Nikhil, et al.
Published: (2025)
JaxARC: A High-Performance JAX-based Environment for Abstraction and Reasoning Research
by: Aadam, et al.
Published: (2026)
by: Aadam, et al.
Published: (2026)
An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR
by: Urgun, Sinan, et al.
Published: (2025)
by: Urgun, Sinan, et al.
Published: (2025)
MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
by: Marcuzzo, Matteo, et al.
Published: (2025)
by: Marcuzzo, Matteo, et al.
Published: (2025)
Similar Items
-
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning
by: Mondorf, Philipp, et al.
Published: (2024) -
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
by: Mondorf, Philipp, et al.
Published: (2024) -
LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models
by: Rabern, Brian, et al.
Published: (2026) -
The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It
by: Bertolazzi, Leonardo, et al.
Published: (2025) -
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models
by: Mondorf, Philipp, et al.
Published: (2024)