GUI for the Automatic1111 WebUI before launching
Image created by Decrypt using AI/Jose Lanz
The Mechanics of Stable Diffusion
The process of creating an image out of random noise. Credit: Jay Alammar
Pros and Cons of Stable Diffusion
Image created by Decrypt using AI/Jose Lanz
However, there are some drawbacks
Operating Stable Diffusion
Leonardo AI: Allows you to experiment with different models, some of which emulate the aesthetics of MidJourney.
Sea Art: A nice place to test a lot of Stable Diffusion models with plugins are other advanced tools.
Mage Space: Offers Stable Diffusion versions v1.5 and 2.1. Although it has a broad gallery of other models, it requires membership.
Lexica: A user-friendly platform that guides you to discover optimal prompts for your images.
Google Colabs: Another accessible option.
Navigating Stable Diffusion with Automatic 1111
Two different GUIs (A1111 and ComfyUI) running Stable Diffusion
Setting Up Automatic 1111
GUI for the Automatic1111 WebUI before launching
Automatic 1111 GUI
Checkpoint or Model: Essentially the heart behind your AI image operation, these pre-trained Stable Diffusion weights can be compared to having diverse artists trained in varied genres. One could be adept at anime, while another excels in realism. Your choice here sets the artistic style of your image.
Positive Prompt: This is where you articulate what you desire in your image.
Negative Prompt: Specify what you don’t want to see in your artwork here.
Create Style: If you wish to save a particular combination of positive and negative prompts as a ‘style’ for future use, do so by clicking here.
Apply Style: Implement a previously saved style to your current prompt.
Generate: Once you’ve set all parameters, click here to bring your image to life.
Sampling Steps: This parameter defines the steps taken to morph random noise into your final image. A range between 20 and 75 usually yields good results, with 25-50 being a practical middle ground.
Sampling Method: If the Models represent the heart of this program, a sampler is the brain behind everything. This is the technique used to take your prompts, your encoders, and every parameter and convert the noise into a coherent image according to your orders. There are many samplers, but we recommend “DDIM” for fast renders with few steps, “Euler a” for drawings or photos of people with smooth skin, and “DPM “for detailed images (DPM++ 2M Karras is probably a good safe bet).
Here is a compilation of the results obtained with the different sampling methods for Stable Diffusion.
Batch Count: Batch count will run multiple batches of generations, one after the other. This will let you create different images with the same prompt. This takes longer times, but uses less vRAM because each image is generated after a previous one is done
Batch Size: This is how many parallel images in each batch. This will give you more images, more quickly —but it will also take more vRAM to process because it generates any images in the same generation.
CFG Scale: It determines the model’s creative freedom, striking a balance between adhering to your prompt and its own imagination. A low CFG will make the model ignore your prompt and be more creative, a high CFG will make it stick to it with no freedom at all. A value between 5 and 12 is typically safe, with 7.5 providing a reliable middle ground.
Width and Height: Specify your image size here. Starting resolutions could be 512X512, 512X768, 768×512, or 768×768. For SDXL (Stability AI’s latest model) the base resolution is 1024×1024
Seed: Think of this as the unique ID of an image, setting a reference for the initial random noise. It’s crucial if you intend to replicate a particular result. Also, each image has a unique seed, which is why it is impossible to truly replicate 100% a specific real life image —because they don’t have a seed.
The Dice Icon: Sets the seed to -1, randomizing it. This guarantees uniqueness for each image generation.
The Recycle Icon: Retains the seed from the last image generation.
Script: It’s the platform for executing advanced instructions that impact your workflow. As a beginner, you might want to leave this untouched for now.
Save: Save your generated image in a folder of your choice. Note that Stable Diffusion also auto-saves your images in its dedicated ‘output’ folder.
Send to img2img: Sends your output to the img2img tab, allowing it to be the reference for new generations that will resemble it.
Send to inpaint: Directs your image to the inpaint tab, enabling you to modify specific image areas, like eyes, hands, or artifacts.
Send to extras: This action relocates your image to the ‘extras’ tab, where you can resize your image without significant detail loss.
Prompt Engineering 101: How to craft good prompts for SD v1.5
Positive Prompts, Negative Prompts, and Fine-Tuning Keyword Weight
What: Identify what you want: Portrait, Photo, Illustration, Drawing, etc.
Subject: Describe the subject you are thinking about: a beautiful woman, a superhero, an old asian person, a black soldier, little kids, a beautiful landscape.
Verb: Describe what the subject is doing: Is the woman posing for the camera? Is the superhero flying or running? Is the asian person smiling or jumping?.
Context: Describe the scenery of your idea: Where is the scene happening? In a park, in a classroom, in a crowded city? be as descriptive as you possibly can
Modifiers: add additional information about your image: If it’s a picture, which lens was used. If it’s a painting, which artist painted it? Which type of lighting was used, which site would feature it? Which clothing or fashion style are you thinking about, is the image scary? These concepts are separated by commas. But remember, the closer to the beginning, the more prominent they will be on the final composition. If you don’t know where to start, this site, and this this Github repository have a lot of good ideas for you to experiment if you don’t want to just copy/paste other people’s prompts
Keyword Integration and Prompt Scheduling
The Lazy Way Out: Copying Prompts
Civitai lets people see the prompts used for many images/Jose Lanz/Decrypt
Avoiding Pitfalls
Installing New Models
Civitai uses filters to let users personalize their searches/Jose Lanz/Decrypt Media
Example of a page to download a specific custom SD v1.5 model from Civitai. Genetated by AI/Jose Lanz
Juggernaut, Photon, Realistic Vision and aZovya Photoreal if you want to play with photorealistic images.
Dreamshaper, RevAnimated, and all the models by DucHaiten if you enjoy 3d Art.
DuelComicMix, DucHaitenAnime, iCoMix, DucHaitenAnime if you like 2d art like mangas and comics.
Editing your image: Image-to-Image and Inpainting (TO DO)
Image created by Stable Diffusion (right) based on the photo used as reference (left) using Img2img/Jose Lanz
Blue hair edited using inpaint over the reference image of a blonde supergirl. Generated with AI/Jose Lanz
Top 5 Extensions to Enhance Stable Diffusion’s Capabilities
LoRAs: Because the Devil is in the Details
An image generated without LoRAs vs the same image generated using a LoRA to add more details. Credit: Jose Lanz
Click on the Extensions tab and select “Install from URL.”
Enter the URL: https://github.com/kohya-ss/sd-webui-additional-networks.git in the box and click on Install.
Once completed, click on “Installed” and then “Apply and restart UI.”
ControlNet: Unleashing the Power of Visual Magic
An image generated without LoRAs vs the same image generated using a LoRA to add more details. Credit: Jose Lanz
Installing ControlNet involves these simple steps:
Visit the extension page and select the ‘Install from URL’ tab.
Paste the following URL into the ‘URL for extension’s repository’ field: https://github.com/Mikubill/sd-webui-controlnet
Click ‘Install’.
Close your Stable Diffusion interface.
The ‘Reference image box’ is where you upload the image you wish to reference for pose, face, color composition, structure, etc.
The ‘Control Type Selection’ is where the ControlNet wizardry occurs. This feature allows you to determine what you want to copy or control.
OpenPose: Pinpoints body’s key parts and replicates a pose. You can select a pose for the entire body, face, or hands using the preprocessor.
Canny: Converts your reference image into a black-and-white scribble with fine lines. This allows your creations to follow these lines as edges, resulting in an accurate resemblance to your reference.
Depth: Generates a ‘depth map’ to create a 3D impression of the image, distinguishing near and far objects—ideal for mimicking 3D cinematic shots and scenes.
Normal: A normal map infers the orientation of a surface—excellent for texturing objects like armors, fabrics, and exterior structures.
MLSD: Recognizes straight lines, making it ideal for reproducing architectural designs.
Lineart: Transforms an image into a drawing—useful for 2D visuals like anime and cartoons.
Softedge: Similar to a Canny model but with softer edges, offering more freedom to the model and slightly less precision.
Scribble: Converts an image into a scribble, yielding more generalized results than the Canny model. Also, you can create a scribble on paint, and use it as reference with no preprocessor to turn your images into realistic creations.
Segmentation: Creates a color map of your image, inferring the objects within it. Every color represents a specific kind of object. You can use it to redecorate you image, or reimagine a scene with the same concept (for example turn a photo from the 1800’s into a photorealistic depiction of the same scenery on a cyberpunk alternate reality or just redecorate your room with a different bed, walls of a different color, etc)
Tile: Adds details to the picture and facilitates upscaling without overburdening your GPU.
Inpaint: Modifies the image or expands its details. Now, Thanks to the recent update and the “inpaint only + llama” model you can outpaint images with extreme attention to detail
Shuffle: Reproduces the color structure of a reference image.
Reference: Generates images similar to your reference in style, composition, and occasionally faces.
T2IA: Lets you control the color and artistic composition of your image.
Roop: Deepfakes at Your Fingertips
Image edited using Roop to change a face for a provided reference. Credit: José Lanz
How the Photopea extension looks inside of A1111
CLIP Interrogator: Creating Prompts from Any Image
Conclusion
Image created by Decrypt using AI/Jose Lanz