Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ge, Jiaxin, Wang, Zora Zhiruo, Zhou, Xuhui, Peng, Yi-Hao, Subramanian, Sanjay, Tan, Qinyue, Sap, Maarten, Suhr, Alane, Fried, Daniel, Neubig, Graham, Darrell, Trevor
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2501.00912
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918065065689088
author	Ge, Jiaxin Wang, Zora Zhiruo Zhou, Xuhui Peng, Yi-Hao Subramanian, Sanjay Tan, Qinyue Sap, Maarten Suhr, Alane Fried, Daniel Neubig, Graham Darrell, Trevor
author_facet	Ge, Jiaxin Wang, Zora Zhiruo Zhou, Xuhui Peng, Yi-Hao Subramanian, Sanjay Tan, Qinyue Sap, Maarten Suhr, Alane Fried, Daniel Neubig, Graham Darrell, Trevor
contents	Designing structured visuals such as presentation slides is essential for communicative needs, necessitating both content creation and visual planning skills. In this work, we tackle the challenge of automated slide generation, where models produce slide presentations from natural language (NL) instructions. We first introduce the SlidesBench benchmark, the first benchmark for slide generation with 7k training and 585 testing examples derived from 310 slide decks across 10 domains. SlidesBench supports evaluations that are (i)reference-based to measure similarity to a target slide, and (ii)reference-free to measure the design quality of generated slides alone. We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats. Built on the success of program generation, we create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation, and achieve results comparable to the closed-source model GPT-4o. We further explore iterative design refinement where the model is tasked to self-refine its own output, and we found that this process improves the slide's quality. We hope that our work will provide a basis for future work on generating structured visuals.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_00912
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AutoPresent: Designing Structured Visuals from Scratch Ge, Jiaxin Wang, Zora Zhiruo Zhou, Xuhui Peng, Yi-Hao Subramanian, Sanjay Tan, Qinyue Sap, Maarten Suhr, Alane Fried, Daniel Neubig, Graham Darrell, Trevor Computer Vision and Pattern Recognition Computation and Language Designing structured visuals such as presentation slides is essential for communicative needs, necessitating both content creation and visual planning skills. In this work, we tackle the challenge of automated slide generation, where models produce slide presentations from natural language (NL) instructions. We first introduce the SlidesBench benchmark, the first benchmark for slide generation with 7k training and 585 testing examples derived from 310 slide decks across 10 domains. SlidesBench supports evaluations that are (i)reference-based to measure similarity to a target slide, and (ii)reference-free to measure the design quality of generated slides alone. We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats. Built on the success of program generation, we create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation, and achieve results comparable to the closed-source model GPT-4o. We further explore iterative design refinement where the model is tasked to self-refine its own output, and we found that this process improves the slide's quality. We hope that our work will provide a basis for future work on generating structured visuals.
title	AutoPresent: Designing Structured Visuals from Scratch
topic	Computer Vision and Pattern Recognition Computation and Language
url	https://arxiv.org/abs/2501.00912

Similar Items