Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Tao, Luo, Yingmin, Qi, Zhongang, Wu, Yang, Shan, Ying, Chen, Chang Wen
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.02884
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913585723080704
author	Yang, Tao Luo, Yingmin Qi, Zhongang Wu, Yang Shan, Ying Chen, Chang Wen
author_facet	Yang, Tao Luo, Yingmin Qi, Zhongang Wu, Yang Shan, Ying Chen, Chang Wen
contents	Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. Finally, we develop an automated text-to-poster system that generates editable SVG posters based on users' design intentions, bridging the gap between layout generation and real-world graphic design applications. This system integrates our proposed layout generation method as the core component, demonstrating its effectiveness in practical scenarios. The code and datasets are open-sourced on https://github.com/posterllava/PosterLLaVA.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_02884
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM Yang, Tao Luo, Yingmin Qi, Zhongang Wu, Yang Shan, Ying Chen, Chang Wen Computer Vision and Pattern Recognition Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. Finally, we develop an automated text-to-poster system that generates editable SVG posters based on users' design intentions, bridging the gap between layout generation and real-world graphic design applications. This system integrates our proposed layout generation method as the core component, demonstrating its effectiveness in practical scenarios. The code and datasets are open-sourced on https://github.com/posterllava/PosterLLaVA.
title	PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2406.02884

Similar Items