Stable Diffusion

I’ve decided to tinker around with generative AI tools to get a feel for them and some understanding. As any good self-hoster would do, I wanted to keep it all local and thus went with the obvious choice of Stable Diffusion. A relatively modest (as these things go) new GPU selection of a RTX 3060 seemed sufficient to serve the need and I’m running it on an older secondary desktop system which was previously used for my photography (and has been sitting idle for quite some time now).

It took a little to get it installed correctly and this seemed mostly due to the newer distro release only including Python 3.12 out-of-the-box; I hit some dependency issues trying to get it functioning on that version. However adding a PPA for the older Python 3.10 allowed me to easily get set up with what seems to be a very popular tool, AUTOMATIC1111’s web UI – and this allowed me to run it headlessly on that secondary PC without issue.

Loading the UI, you are presented with a LOT of options and tools. Some of these are somewhat obvious, the prompt & negative prompt, image dimensions and seed (which I guessed had to do with the random initialization and this was correct). But many, many others were included and this was just for the txt2img tool (the most common, which turns text from a prompt into a image). Checkpoints stood out as something which seemed relatively important (very much so) and I have some understanding of them now. Hires fix seemed to work as a resolution enhancer. Further functionality such as Lora and Refiners I’ve barely touched and still need to play with.

Some results from the Stable Diffusion XL model and the following prompt:

beautiful landscape, mountains with pointy peaks, 8k, high-resolution photograph, cinematic composition, epic, hyper-detailed, summer, wildflowers, wide shot, masterpiece, (rule of thirds), sunset photo at golden hour, alpine, path, stream, leading line, colorful puffy clouds, deep depth of field, dramatic lighting, hdr effect

As for future tools, there is img2img which allows you to generate an image from another (instead of text) and has things such as sketch and inpaint. From my understanding, they allow you to have some fine-grained control over the generative AI process and definitely seems worth exploring. I’m hoping these can allow you to easily create a rough draft of an image idea (as a painter would do), and then use the models to fill in details; it would be very rewarding if I could generate my own images, providing the overall high-level details and allow the AI to fill in the remainder.

It’s challenging to find time with my landscape photography hobby and the types of subjects I typically shoot nowadays, and of course you are always at the mercy of the weather, lighting and nature for your compositions… so to be able to scratch this itch at home sounds intriguing. I’ve also had a desire to try out oil painting a little bit beyond the few times in my life I’ve tried it; this seems like potentially a nice mashup of the two.

Some other learnings:

The “quality” of the results you get is HEAVILY influenced on the checkpoint you use. If you are trying to get results like some high-quality example images you’ve seen you will definitely need to find the right model. There are an incredible number of them available, and while some provide good general-purpose results, there are many that are tailored for specific styles. Civitai seems like a great resource for these.
Secondly, your prompts are crucial and depending on things, the results can be highly sensitive to various keywords.
Also, the seed can matter quite a bit. You could generate dozens of images (easy with the batch tool) and maybe only one or two would really match what you are looking for.

As an idea of something else to potentially explore, I’m rather curious how training a custom model works. Specifically the logistical details of it, such as:

How many source images do you need to provide?
How long does training take for a given dataset?
What kinds of results can you achieve for a given dataset input?

Finally, this does leave me wondering – what will the future hold? The obvious context is regarding the evolution of these tools, but on a larger view – what impact will these kinds of technologies have on the world? I did find it rather surprising how easy it was to get this set up and generating (in my opinion!) fairly high-quality results. It’s also a lot of fun to play with!

One thing seems highly likely, if not inevitable – these tools will get easier to use, the hardware barrier to entry will lower over time, and this technology will certainly become more mainstream and I imagine in the hands of everyone. I can envision many possibilities here on how it will impact society and it will be interesting to see how it all plays out. And after seeing all of this I’m also a bit relieved that I never decided to take the landscape photography gig full-time!

Leave a Reply Cancel reply