Enregistré dans:
Détails bibliographiques
Auteurs principaux: Zhang, Juan, Chen, Jiahao, Wang, Cheng, Yu, Zhiwang, Qi, Tangquan, Liu, Can, Wu, Di
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2403.11700
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866913277791961088
author Zhang, Juan
Chen, Jiahao
Wang, Cheng
Yu, Zhiwang
Qi, Tangquan
Liu, Can
Wu, Di
author_facet Zhang, Juan
Chen, Jiahao
Wang, Cheng
Yu, Zhiwang
Qi, Tangquan
Liu, Can
Wu, Di
contents With the widespread popularity of internet celebrity marketing all over the world, short video production has gradually become a popular way of presenting products information. However, the traditional video production industry usually includes series of procedures as script writing, video filming in a professional studio, video clipping, special effects rendering, customized post-processing, and so forth. Not to mention that multilingual videos is not accessible for those who could not speak multilingual languages. These complicated procedures usually needs a professional team to complete, and this made short video production costly in both time and money. This paper presents an intelligent system that supports the automatic generation of talking avatar videos, namely Virbo. With simply a user-specified script, Virbo could use a deep generative model to generate a target talking videos. Meanwhile, the system also supports multimodal inputs to customize the video with specified face, specified voice and special effects. This system also integrated a multilingual customization module that supports generate multilingual talking avatar videos in a batch with hundreds of delicate templates and creative special effects. Through a series of user studies and demo tests, we found that Virbo can generate talking avatar videos that maintained a high quality of videos as those from a professional team while reducing the entire production costs significantly. This intelligent system will effectively promote the video production industry and facilitate the internet marketing neglecting of language barriers and cost challenges.
format Preprint
id arxiv_https___arxiv_org_abs_2403_11700
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
Zhang, Juan
Chen, Jiahao
Wang, Cheng
Yu, Zhiwang
Qi, Tangquan
Liu, Can
Wu, Di
Multimedia
With the widespread popularity of internet celebrity marketing all over the world, short video production has gradually become a popular way of presenting products information. However, the traditional video production industry usually includes series of procedures as script writing, video filming in a professional studio, video clipping, special effects rendering, customized post-processing, and so forth. Not to mention that multilingual videos is not accessible for those who could not speak multilingual languages. These complicated procedures usually needs a professional team to complete, and this made short video production costly in both time and money. This paper presents an intelligent system that supports the automatic generation of talking avatar videos, namely Virbo. With simply a user-specified script, Virbo could use a deep generative model to generate a target talking videos. Meanwhile, the system also supports multimodal inputs to customize the video with specified face, specified voice and special effects. This system also integrated a multilingual customization module that supports generate multilingual talking avatar videos in a batch with hundreds of delicate templates and creative special effects. Through a series of user studies and demo tests, we found that Virbo can generate talking avatar videos that maintained a high quality of videos as those from a professional team while reducing the entire production costs significantly. This intelligent system will effectively promote the video production industry and facilitate the internet marketing neglecting of language barriers and cost challenges.
title Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
topic Multimedia
url https://arxiv.org/abs/2403.11700