A Generation Algorithm for Text to Image

Project Code :TCMAPY1974

Objective

The objective of this project is to develop an advanced text-to-image generation system using Stable Diffusion and LoRA fine-tuning techniques, specifically focused on generating high-quality bird images. The project aims to leverage the power of diffusion models to create photorealistic images from textual descriptions, fine-tuned on the CUB-200-2011 dataset for specialized bird species generation. Stable Diffusion utilizes a U-Net for the denoising process, a CLIP model for text encoding, and a Variational Autoencoder (VAE) for image compression and reconstruction. By integrating LoRA, the system enables efficient fine-tuning of the model while maintaining performance, ensuring faster generation times and optimized memory usage. Ultimately, the project seeks to provide a versatile tool for generating custom bird images based on user prompts.

Abstract

This project focuses on developing a Text-to-Image generation system using the Stable Diffusion 1.5 model fine-tuned with the CUB-200 dataset. The primary objective is to generate high-quality images from textual descriptions. Stable Diffusion, a latent diffusion model, is employed in this system to create visually realistic images from textual inputs. The model leverages key components such as CLIP Text Encoder, UNet, and VAE (Variational Autoencoder) for text understanding, image generation, and latent space encoding, respectively. LoRA (Low-Rank Adaptation) is used for efficient fine-tuning of the pre-trained model, adapting it to the specific task of bird image generation from textual prompts. This system provides options for image generation on CPU, fast CPU, and GPU, allowing users to choose based on their available computational resources. The project is developed using the Flask framework, with SQLite for storing user data, and a front-end built with HTML, CSS, and JavaScript. The evaluation of the generated images is carried out using FID (Fréchet Inception Distance) and Inception Score, two standard metrics for assessing the quality of generated images. This approach offers significant advancements in generative models, specifically in the field of text-to-image synthesis for specialized datasets like CUB-200.