Gespeichert in:
| Hauptverfasser: | , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2023
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2305.00521 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866916120219353088 |
|---|---|
| author | Ki, Taekyung Min, Dongchan |
| author_facet | Ki, Taekyung Min, Dongchan |
| contents | In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lip-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate lip-sync videos even with the zero-shot setting and enhance characteristics of an unseen face using a few seconds of target video through the proposed adaptation method. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2305_00521 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | StyleLipSync: Style-based Personalized Lip-sync Video Generation Ki, Taekyung Min, Dongchan Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lip-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate lip-sync videos even with the zero-shot setting and enhance characteristics of an unseen face using a few seconds of target video through the proposed adaptation method. |
| title | StyleLipSync: Style-based Personalized Lip-sync Video Generation |
| topic | Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2305.00521 |