Saved in:
Bibliographic Details
Main Authors: Cifani, Susanna, Bernardi, Mario Luca, Cimitile, Marta
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.28607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910266961166336
author Cifani, Susanna
Bernardi, Mario Luca
Cimitile, Marta
author_facet Cifani, Susanna
Bernardi, Mario Luca
Cimitile, Marta
contents Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the integration of MLLMs has enabled agents to interact directly with GUIs, existing approaches typically treat task sequences as discrete, linear episodes. This fragmentation prevents agents from capturing the underlying transition topology, limiting their effectiveness in novel or non-stationary scenarios. To address this, we propose a novel multimodal multi-agent framework that achieves automatic workflow execution through a distinct two-phase pipeline. First, during an offline discovery phase, the architecture adaptively constructs a topological knowledge base from fragmented execution logs. During inference, agents leverage Adaptive Retrieval-Augmented Generation (RAG) over this fixed, pre-established graph, coupled with a closed-loop collaborative verification protocol to dynamically self-correct and navigate. This graph-based approach facilitates superior task decomposition and adaptive navigation performance. We validate our framework in a real-world context, demonstrating its ability to maintain high reliability and semantic awareness even with limited training data.
format Preprint
id arxiv_https___arxiv_org_abs_2605_28607
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution
Cifani, Susanna
Bernardi, Mario Luca
Cimitile, Marta
Artificial Intelligence
Computation and Language
Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the integration of MLLMs has enabled agents to interact directly with GUIs, existing approaches typically treat task sequences as discrete, linear episodes. This fragmentation prevents agents from capturing the underlying transition topology, limiting their effectiveness in novel or non-stationary scenarios. To address this, we propose a novel multimodal multi-agent framework that achieves automatic workflow execution through a distinct two-phase pipeline. First, during an offline discovery phase, the architecture adaptively constructs a topological knowledge base from fragmented execution logs. During inference, agents leverage Adaptive Retrieval-Augmented Generation (RAG) over this fixed, pre-established graph, coupled with a closed-loop collaborative verification protocol to dynamically self-correct and navigate. This graph-based approach facilitates superior task decomposition and adaptive navigation performance. We validate our framework in a real-world context, demonstrating its ability to maintain high reliability and semantic awareness even with limited training data.
title Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2605.28607