Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	He, Jun, Ye, Junyan, Huang, Zilong, Jiang, Dongzhi, Zhang, Chenjue, Zhu, Leqi, Zhang, Renrui, Zhang, Xiang, Li, Weijia
Format:	Preprint
Publié:	2026
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2602.01756
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866908805035458560
author	He, Jun Ye, Junyan Huang, Zilong Jiang, Dongzhi Zhang, Chenjue Zhu, Leqi Zhang, Renrui Zhang, Xiang Li, Weijia
author_facet	He, Jun Ye, Junyan Huang, Zilong Jiang, Dongzhi Zhang, Chenjue Zhu, Leqi Zhang, Renrui Zhang, Xiang Li, Weijia
contents	While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although emerging unified understanding-generation models have improved intent comprehension, they still struggle to accomplish tasks involving complex knowledge reasoning within a single model. Moreover, constrained by static internal priors, these models remain unable to adapt to the evolving dynamics of the real world. To bridge these gaps, we introduce Mind-Brush, a unified agentic framework that transforms generation into a dynamic, knowledge-driven workflow. Simulating a human-like 'think-research-create' paradigm, Mind-Brush actively retrieves multimodal evidence to ground out-of-distribution concepts and employs reasoning tools to resolve implicit visual constraints. To rigorously evaluate these capabilities, we propose Mind-Bench, a comprehensive benchmark comprising 500 distinct samples spanning real-time news, emerging concepts, and domains such as mathematical and Geo-Reasoning. Extensive experiments demonstrate that Mind-Brush significantly enhances the capabilities of unified models, realizing a zero-to-one capability leap for the Qwen-Image baseline on Mind-Bench, while achieving superior results on established benchmarks like WISE and RISE.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01756
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation He, Jun Ye, Junyan Huang, Zilong Jiang, Dongzhi Zhang, Chenjue Zhu, Leqi Zhang, Renrui Zhang, Xiang Li, Weijia Computer Vision and Pattern Recognition While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although emerging unified understanding-generation models have improved intent comprehension, they still struggle to accomplish tasks involving complex knowledge reasoning within a single model. Moreover, constrained by static internal priors, these models remain unable to adapt to the evolving dynamics of the real world. To bridge these gaps, we introduce Mind-Brush, a unified agentic framework that transforms generation into a dynamic, knowledge-driven workflow. Simulating a human-like 'think-research-create' paradigm, Mind-Brush actively retrieves multimodal evidence to ground out-of-distribution concepts and employs reasoning tools to resolve implicit visual constraints. To rigorously evaluate these capabilities, we propose Mind-Bench, a comprehensive benchmark comprising 500 distinct samples spanning real-time news, emerging concepts, and domains such as mathematical and Geo-Reasoning. Extensive experiments demonstrate that Mind-Brush significantly enhances the capabilities of unified models, realizing a zero-to-one capability leap for the Qwen-Image baseline on Mind-Bench, while achieving superior results on established benchmarks like WISE and RISE.
title	Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.01756

Documents similaires