Saved in:
Bibliographic Details
Main Authors: Tam, Justin Z., Grosset, Pascal, Banesh, Divya, Ramachandra, Nesar, Turton, Terece L., Ahrens, James
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.12920
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909846949855232
author Tam, Justin Z.
Grosset, Pascal
Banesh, Divya
Ramachandra, Nesar
Turton, Terece L.
Ahrens, James
author_facet Tam, Justin Z.
Grosset, Pascal
Banesh, Divya
Ramachandra, Nesar
Turton, Terece L.
Ahrens, James
contents Analyzing large-scale scientific datasets presents substantial challenges due to their sheer volume, structural complexity, and the need for specialized domain knowledge. Automation tools, such as PandasAI, typically require full data ingestion and lack context of the full data structure, making them impractical as intelligent data analysis assistants for datasets at the terabyte scale. To overcome these limitations, we propose InferA, a multi-agent system that leverages large language models to enable scalable and efficient scientific data analysis. At the core of the architecture is a supervisor agent that orchestrates a team of specialized agents responsible for distinct phases of the data retrieval and analysis. The system engages interactively with users to elicit their analytical intent and confirm query objectives, ensuring alignment between user goals and system actions. To demonstrate the framework's usability, we evaluate the system using ensemble runs from the HACC cosmology simulation which comprises several terabytes.
format Preprint
id arxiv_https___arxiv_org_abs_2510_12920
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle InferA: A Smart Assistant for Cosmological Ensemble Data
Tam, Justin Z.
Grosset, Pascal
Banesh, Divya
Ramachandra, Nesar
Turton, Terece L.
Ahrens, James
Instrumentation and Methods for Astrophysics
Artificial Intelligence
Analyzing large-scale scientific datasets presents substantial challenges due to their sheer volume, structural complexity, and the need for specialized domain knowledge. Automation tools, such as PandasAI, typically require full data ingestion and lack context of the full data structure, making them impractical as intelligent data analysis assistants for datasets at the terabyte scale. To overcome these limitations, we propose InferA, a multi-agent system that leverages large language models to enable scalable and efficient scientific data analysis. At the core of the architecture is a supervisor agent that orchestrates a team of specialized agents responsible for distinct phases of the data retrieval and analysis. The system engages interactively with users to elicit their analytical intent and confirm query objectives, ensuring alignment between user goals and system actions. To demonstrate the framework's usability, we evaluate the system using ensemble runs from the HACC cosmology simulation which comprises several terabytes.
title InferA: A Smart Assistant for Cosmological Ensemble Data
topic Instrumentation and Methods for Astrophysics
Artificial Intelligence
url https://arxiv.org/abs/2510.12920