Saved in:
Bibliographic Details
Main Authors: Lee, Kevin, Spiewak, Russell, Walsh, James
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.20694
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917258614276096
author Lee, Kevin
Spiewak, Russell
Walsh, James
author_facet Lee, Kevin
Spiewak, Russell
Walsh, James
contents Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.
format Preprint
id arxiv_https___arxiv_org_abs_2511_20694
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
Lee, Kevin
Spiewak, Russell
Walsh, James
Artificial Intelligence
Solar and Stellar Astrophysics
Machine Learning
Space Physics
Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.
title Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
topic Artificial Intelligence
Solar and Stellar Astrophysics
Machine Learning
Space Physics
url https://arxiv.org/abs/2511.20694