What is the mechanism behind AI art?
The enchantment of witnessing an AI art generator perfectly interpret your prompt may initially seem magical, but in reality, it operates on the foundation of computational power, machine learning algorithms, high-performance graphics cards, and extensive datasets.
Let's delve into the mechanics.
AI art generators receive a textual prompt and endeavor to transform it into a corresponding image. Before proceeding, these applications must first comprehend the essence of your request. This is accomplished through training AI algorithms on vast collections of image-text pairs, ranging from hundreds of thousands to billions. This training enables them to discern nuances between various subjects, be it distinguishing between dogs and cats or recognizing different art styles like Vermeer versus Picasso. The depth of understanding varies across different AI models, contingent upon the scale and specificity of their training data.
Following comprehension, the AI embarks on rendering the image. There are two primary methodologies:
1. Diffusion models, such as Stable Diffusion, DALL·E 2, Midjourney, and CLIP-Guided Diffusion, commence with a randomized noise field. Subsequently, they iteratively refine it to align with the provided prompt.
2. Generative Adversarial Networks (GANs), including VQGAN-CLIP, BigGAN, and StyleGAN, employ a different approach and have been established for a longer duration.
While both models are capable of generating realistic outputs, diffusion models excel at producing unconventional or imaginative images.
While some applications transparently disclose the models they utilize, others opt for opacity. For instance, VQGAN-CLIP and Stable Diffusion are both open-source, resulting in a plethora of openly accessible applications utilizing them, alongside many others that remain silent about their model selection. Additionally, some apps leverage proprietary data to train open-source models, enhancing their performance.Consequently, numerous AI art generators essentially serve as alternative user interfaces for identical art-generating algorithms. From a business standpoint, this strategy is understandable, albeit frustrating when selecting or evaluating these apps. Whenever feasible, I've provided information regarding the models employed by each application. In cases where this information is undisclosed, I've made informed assumptions based on my experience with various generative AIs.