Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Xian, Zhang, Ming, Fang, Zhiyu, Li, Fei, Wang, Bin, Jiang, Yong, Zhou, Hao
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval
Online Access:	https://arxiv.org/abs/2512.20034
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

The automation of user interface development has the potential to accelerate software delivery by mitigating intensive manual implementation. Despite the advancements in Large Multimodal Models for design-to-code translation, existing methodologies predominantly yield unstructured, flat codebases that lack compatibility with component-oriented libraries such as React or Angular. Such outputs typically exhibit low cohesion and high coupling, complicating long-term maintenance. In this paper, we propose \textbf{VSA (VSA)}, a multi-stage paradigm designed to synthesize organized frontend assets through visual-structural alignment. Our approach first employs a spatial-aware transformer to reconstruct the visual input into a hierarchical tree representation. Moving beyond basic layout extraction, we integrate an algorithmic pattern-matching layer to identify recurring UI motifs and encapsulate them into modular templates. These templates are then processed via a schema-driven synthesis engine, ensuring the Large Language Model generates type-safe, prop-drilled components suitable for production environments. Experimental results indicate that our framework yields a substantial improvement in code modularity and architectural consistency over state-of-the-art benchmarks, effectively bridging the gap between raw pixels and scalable software engineering.

Similar Items