Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.05149 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916645820170240 |
|---|---|
| author | Sahu, Rajdeep Roshan |
| author_facet | Sahu, Rajdeep Roshan |
| contents | This research focuses on the development and enhancement of text-to-image denoising diffusion models, addressing key challenges such as limited sample diversity and training instability. By incorporating Classifier-Free Guidance (CFG) and Exponential Moving Average (EMA) techniques, this study significantly improves image quality, diversity, and stability. Utilizing Hugging Face's state-of-the-art text-to-image generation model, the proposed enhancements establish new benchmarks in generative AI. This work explores the underlying principles of diffusion models, implements advanced strategies to overcome existing limitations, and presents a comprehensive evaluation of the improvements achieved. Results demonstrate substantial progress in generating stable, diverse, and high-quality images from textual descriptions, advancing the field of generative artificial intelligence and providing new foundations for future applications.
Keywords: Text-to-image, Diffusion model, Classifier-free guidance, Exponential moving average, Image generation. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_05149 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Development and Enhancement of Text-to-Image Diffusion Models Sahu, Rajdeep Roshan Computer Vision and Pattern Recognition Artificial Intelligence This research focuses on the development and enhancement of text-to-image denoising diffusion models, addressing key challenges such as limited sample diversity and training instability. By incorporating Classifier-Free Guidance (CFG) and Exponential Moving Average (EMA) techniques, this study significantly improves image quality, diversity, and stability. Utilizing Hugging Face's state-of-the-art text-to-image generation model, the proposed enhancements establish new benchmarks in generative AI. This work explores the underlying principles of diffusion models, implements advanced strategies to overcome existing limitations, and presents a comprehensive evaluation of the improvements achieved. Results demonstrate substantial progress in generating stable, diverse, and high-quality images from textual descriptions, advancing the field of generative artificial intelligence and providing new foundations for future applications. Keywords: Text-to-image, Diffusion model, Classifier-free guidance, Exponential moving average, Image generation. |
| title | Development and Enhancement of Text-to-Image Diffusion Models |
| topic | Computer Vision and Pattern Recognition Artificial Intelligence |
| url | https://arxiv.org/abs/2503.05149 |