Saved in:
Bibliographic Details
Main Authors: Zeng, Zihang, Zhang, Jiaquan, Li, Pengze, Qi, Yuan, Chen, Xi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.03233
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914366243209216
author Zeng, Zihang
Zhang, Jiaquan
Li, Pengze
Qi, Yuan
Chen, Xi
author_facet Zeng, Zihang
Zhang, Jiaquan
Li, Pengze
Qi, Yuan
Chen, Xi
contents Large Language Models (LLMs) demonstrate potentials for automating scientific code generation but face challenges in reliability, error propagation in multi-agent workflows, and evaluation in domains with ill-defined success metrics. We present a Bayesian adversarial multi-agent framework specifically designed for AI for Science (AI4S) tasks in the form of a Low-code Platform (LCP). Three LLM-based agents are coordinated under the Bayesian framework: a Task Manager that structures user inputs into actionable plans and adaptive test cases, a Code Generator that produces candidate solutions, and an Evaluator providing comprehensive feedback. The framework employs an adversarial loop where the Task Manager iteratively refines test cases to challenge the Code Generator, while prompt distributions are dynamically updated using Bayesian principles by integrating code quality metrics: functional correctness, structural alignment, and static analysis. This co-optimization of tests and code reduces dependence on LLM reliability and addresses evaluation uncertainty inherent to scientific tasks. LCP also streamlines human-AI collaboration by translating non-expert prompts into domain-specific requirements, bypassing the need for manual prompt engineering by practitioners without coding backgrounds. Benchmark evaluations demonstrate LCP's effectiveness in generating robust code while minimizing error propagation. The proposed platform is also tested on an Earth Science cross-disciplinary task and demonstrates strong reliability, outperforming competing models.
format Preprint
id arxiv_https___arxiv_org_abs_2603_03233
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Zeng, Zihang
Zhang, Jiaquan
Li, Pengze
Qi, Yuan
Chen, Xi
Artificial Intelligence
Large Language Models (LLMs) demonstrate potentials for automating scientific code generation but face challenges in reliability, error propagation in multi-agent workflows, and evaluation in domains with ill-defined success metrics. We present a Bayesian adversarial multi-agent framework specifically designed for AI for Science (AI4S) tasks in the form of a Low-code Platform (LCP). Three LLM-based agents are coordinated under the Bayesian framework: a Task Manager that structures user inputs into actionable plans and adaptive test cases, a Code Generator that produces candidate solutions, and an Evaluator providing comprehensive feedback. The framework employs an adversarial loop where the Task Manager iteratively refines test cases to challenge the Code Generator, while prompt distributions are dynamically updated using Bayesian principles by integrating code quality metrics: functional correctness, structural alignment, and static analysis. This co-optimization of tests and code reduces dependence on LLM reliability and addresses evaluation uncertainty inherent to scientific tasks. LCP also streamlines human-AI collaboration by translating non-expert prompts into domain-specific requirements, bypassing the need for manual prompt engineering by practitioners without coding backgrounds. Benchmark evaluations demonstrate LCP's effectiveness in generating robust code while minimizing error propagation. The proposed platform is also tested on an Earth Science cross-disciplinary task and demonstrates strong reliability, outperforming competing models.
title AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
topic Artificial Intelligence
url https://arxiv.org/abs/2603.03233