Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.20694 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917258614276096 |
|---|---|
| author | Lee, Kevin Spiewak, Russell Walsh, James |
| author_facet | Lee, Kevin Spiewak, Russell Walsh, James |
| contents | Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2511_20694 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning Lee, Kevin Spiewak, Russell Walsh, James Artificial Intelligence Solar and Stellar Astrophysics Machine Learning Space Physics Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall. |
| title | Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning |
| topic | Artificial Intelligence Solar and Stellar Astrophysics Machine Learning Space Physics |
| url | https://arxiv.org/abs/2511.20694 |