Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.06642 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916329189015552 |
|---|---|
| author | Wei, Fanyue Zeng, Wei Li, Zhenyang Yin, Dawei Duan, Lixin Li, Wen |
| author_facet | Wei, Fanyue Zeng, Wei Li, Zhenyang Yin, Dawei Duan, Lixin Li, Wen |
| contents | Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: \url{https://github.com/wfanyue/DPG-T2I-Personalization}. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2407_06642 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning Wei, Fanyue Zeng, Wei Li, Zhenyang Yin, Dawei Duan, Lixin Li, Wen Computer Vision and Pattern Recognition Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: \url{https://github.com/wfanyue/DPG-T2I-Personalization}. |
| title | Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2407.06642 |