Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhou, Jiawei, Zhang, Chi, Feng, Xiang, Zhang, Qiming, Qiu, Haibo, He, Lihuo, Ye, Dengpan, Gao, Xinbo, Zhang, Jing
Formato:	Preprint
Publicado:	2026
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2603.17508
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866912976323215360
author	Zhou, Jiawei Zhang, Chi Feng, Xiang Zhang, Qiming Qiu, Haibo He, Lihuo Ye, Dengpan Gao, Xinbo Zhang, Jing
author_facet	Zhou, Jiawei Zhang, Chi Feng, Xiang Zhang, Qiming Qiu, Haibo He, Lihuo Ye, Dengpan Gao, Xinbo Zhang, Jing
contents	We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth across subjects, image modalities, and programming languages. By incorporating authentic user-sourced cases, the benchmark spans a vast spectrum of digital content -- from scientific visualizations to complex symbolic notations -- each paired with executable reference code. To complement this diversity, our evaluation framework provides necessary depth; by decoupling performance into perceptual fidelity and symbolic precision, it transcends surface-level accuracy to expose the granular structural failures and reasoning bottlenecks of current LMMs. Our evaluation reveals a substantial performance gap among leading LMMs; even state-of-the-art models struggle to preserve structural integrity in complex scenarios, underscoring that multimodal code generation remains a formidable challenge. Data and code are available at https://github.com/MiliLab/Omni-I2C.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_17508
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation Zhou, Jiawei Zhang, Chi Feng, Xiang Zhang, Qiming Qiu, Haibo He, Lihuo Ye, Dengpan Gao, Xinbo Zhang, Jing Computer Vision and Pattern Recognition We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth across subjects, image modalities, and programming languages. By incorporating authentic user-sourced cases, the benchmark spans a vast spectrum of digital content -- from scientific visualizations to complex symbolic notations -- each paired with executable reference code. To complement this diversity, our evaluation framework provides necessary depth; by decoupling performance into perceptual fidelity and symbolic precision, it transcends surface-level accuracy to expose the granular structural failures and reasoning bottlenecks of current LMMs. Our evaluation reveals a substantial performance gap among leading LMMs; even state-of-the-art models struggle to preserve structural integrity in complex scenarios, underscoring that multimodal code generation remains a formidable challenge. Data and code are available at https://github.com/MiliLab/Omni-I2C.
title	Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.17508

Ejemplares similares