Saved in:
Bibliographic Details
Main Authors: Dai, Ruiting, Wang, Zheyu, Yang, Haoyu, Liu, Yihan, Wang, Chengzhi, Zhang, Zekun, Huang, Zishan, Cen, Jiaman, Mo, Lisi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04144
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912875008753664
author Dai, Ruiting
Wang, Zheyu
Yang, Haoyu
Liu, Yihan
Wang, Chengzhi
Zhang, Zekun
Huang, Zishan
Cen, Jiaman
Mo, Lisi
author_facet Dai, Ruiting
Wang, Zheyu
Yang, Haoyu
Liu, Yihan
Wang, Chengzhi
Zhang, Zekun
Huang, Zishan
Cen, Jiaman
Mo, Lisi
contents Data incompleteness severely impedes the reliability of multimodal systems. Existing reconstruction methods face distinct bottlenecks: conventional parametric/generative models are prone to hallucinations due to over-reliance on internal memory, while retrieval-augmented frameworks struggle with retrieval rigidity. Critically, these end-to-end architectures are fundamentally constrained by Semantic-Detail Entanglement -- a structural conflict between logical reasoning and signal synthesis that compromises fidelity. In this paper, we present \textbf{\underline{O}}mni-\textbf{\underline{M}}odality \textbf{\underline{G}}eneration Agent (\textbf{OMG-Agent}), a novel framework that shifts the paradigm from static mapping to a dynamic coarse-to-fine Agentic Workflow. By mimicking a \textit{deliberate-then-act} cognitive process, OMG-Agent explicitly decouples the task into three synergistic stages: (1) an MLLM-driven Semantic Planner that resolves input ambiguity via Progressive Contextual Reasoning, creating a deterministic structured semantic plan; (2) a non-parametric Evidence Retriever that grounds abstract semantics in external knowledge; and (3) a Retrieval-Injected Executor that utilizes retrieved evidence as flexible feature prompts to overcome rigidity and synthesize high-fidelity details. Extensive experiments on multiple benchmarks demonstrate that OMG-Agent consistently surpasses state-of-the-art methods, maintaining robustness under extreme missingness, e.g., a $2.6$-point gain on CMU-MOSI at $70$\% missing rates.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04144
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows
Dai, Ruiting
Wang, Zheyu
Yang, Haoyu
Liu, Yihan
Wang, Chengzhi
Zhang, Zekun
Huang, Zishan
Cen, Jiaman
Mo, Lisi
Artificial Intelligence
Machine Learning
Data incompleteness severely impedes the reliability of multimodal systems. Existing reconstruction methods face distinct bottlenecks: conventional parametric/generative models are prone to hallucinations due to over-reliance on internal memory, while retrieval-augmented frameworks struggle with retrieval rigidity. Critically, these end-to-end architectures are fundamentally constrained by Semantic-Detail Entanglement -- a structural conflict between logical reasoning and signal synthesis that compromises fidelity. In this paper, we present \textbf{\underline{O}}mni-\textbf{\underline{M}}odality \textbf{\underline{G}}eneration Agent (\textbf{OMG-Agent}), a novel framework that shifts the paradigm from static mapping to a dynamic coarse-to-fine Agentic Workflow. By mimicking a \textit{deliberate-then-act} cognitive process, OMG-Agent explicitly decouples the task into three synergistic stages: (1) an MLLM-driven Semantic Planner that resolves input ambiguity via Progressive Contextual Reasoning, creating a deterministic structured semantic plan; (2) a non-parametric Evidence Retriever that grounds abstract semantics in external knowledge; and (3) a Retrieval-Injected Executor that utilizes retrieved evidence as flexible feature prompts to overcome rigidity and synthesize high-fidelity details. Extensive experiments on multiple benchmarks demonstrate that OMG-Agent consistently surpasses state-of-the-art methods, maintaining robustness under extreme missingness, e.g., a $2.6$-point gain on CMU-MOSI at $70$\% missing rates.
title OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2602.04144