Saved in:
Bibliographic Details
Main Authors: Yang, Tao, Luo, Yingmin, Qi, Zhongang, Wu, Yang, Shan, Ying, Chen, Chang Wen
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.02884
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913585723080704
author Yang, Tao
Luo, Yingmin
Qi, Zhongang
Wu, Yang
Shan, Ying
Chen, Chang Wen
author_facet Yang, Tao
Luo, Yingmin
Qi, Zhongang
Wu, Yang
Shan, Ying
Chen, Chang Wen
contents Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. Finally, we develop an automated text-to-poster system that generates editable SVG posters based on users' design intentions, bridging the gap between layout generation and real-world graphic design applications. This system integrates our proposed layout generation method as the core component, demonstrating its effectiveness in practical scenarios. The code and datasets are open-sourced on https://github.com/posterllava/PosterLLaVA.
format Preprint
id arxiv_https___arxiv_org_abs_2406_02884
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM
Yang, Tao
Luo, Yingmin
Qi, Zhongang
Wu, Yang
Shan, Ying
Chen, Chang Wen
Computer Vision and Pattern Recognition
Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. Finally, we develop an automated text-to-poster system that generates editable SVG posters based on users' design intentions, bridging the gap between layout generation and real-world graphic design applications. This system integrates our proposed layout generation method as the core component, demonstrating its effectiveness in practical scenarios. The code and datasets are open-sourced on https://github.com/posterllava/PosterLLaVA.
title PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2406.02884