FAQ

Common questions and honest answers.

General

What is Stable Diffusion?

Stable Diffusion is an AI model that generates images from text descriptions. You type a prompt like "a cat sitting on a beach at sunset" and it does its best to create an image of that. It works by starting with random noise and gradually refining it into a picture, guided by your text.

picoDiffusion uses Stable Diffusion version 1.5, which is a well-established version with a huge library of community-made models, styles, and add-ons.

Is this free?

Yes. picoDiffusion is free and open source. Everything runs on your own computer — there are no subscriptions, API fees, or cloud costs.

Most models (checkpoints, VAEs, LoRAs) on sites like civitai.com and HuggingFace are free to download. Some creators do charge for their models — just like real life, not everything is free. But what is available for free is genuinely good, especially for learning, and that is our goal. We have listed some solid free models on the Getting Started page to get you going.

That said, this project is built by a real person on a real budget. If you find it useful and want to help keep it going, there is a Patreon (link coming soon). No pressure — the tool will always be free.

How is this different from Midjourney, DALL-E, or other AI image tools?

Those services run in the cloud on someone else's servers. You pay per image (or per subscription) and your prompts go through their systems.

picoDiffusion runs entirely on your own computer. Nothing leaves your machine. There are no content filters, no usage limits, and no ongoing costs beyond the electricity to run your GPU. But do please try to be mature about it. We all know what you are going to do. Try making astronauts on a seesaw instead.

Can I use the images I generate for anything?

That depends on the checkpoint you use. Each checkpoint has its own licence. Some allow commercial use, some do not. Check the licence on the checkpoint's download page before using generated images commercially.

Hardware

What GPU do I need?

Any NVIDIA GPU with at least 4GB of VRAM. Older cards like the Quadro P2000, GTX 1050 Ti, or GTX 970 work fine. They are slower than modern cards, but they produce the same quality images — they just take longer. Newer cards (RTX 3060, 4070, etc.) will absolutely work and will be faster — the CUDA version we use supports everything from Pascal (2016) through the latest generation.

More VRAM lets you generate larger images. 4GB is enough for 512x512 and sometimes 768x768. 8GB gives you plenty of room. 16GB? We have never been near a 16GB GPU. Is it fun?

Can I use an AMD GPU?

Not right now. picoDiffusion uses NVIDIA CUDA for GPU acceleration. AMD GPUs use a different system called ROCm, which needs a completely different Docker setup, different PyTorch builds, and different testing.

Could it work? Probably. PyTorch does support ROCm. But we do not have an AMD GPU to develop and test on. Hardware is the bottleneck, not willingness.

Can I run this on OSX?

The short answer is: not easily, and we cannot help much here. We want to be upfront about why.

Apple Silicon (M1, M2, M3, M4) does support PyTorch through MPS (Metal Performance Shaders), and the picoDiffusion code already detects and uses it. The problem is Docker. Docker on Mac runs a Linux virtual machine under the hood, and that VM cannot access the Metal GPU. So the Docker container falls back to CPU mode, which is painfully slow.

Running natively on Mac without Docker (just Python and pip) would give you MPS acceleration and reasonable speed. But packaging that into something easy to install is a whole separate challenge. A standalone Mac app would need to bundle Python and all the libraries, and distributing it would require an Apple Developer account to sign and notarise the app — which is an ongoing cost. We are honestly not sure you can even sign an app that bundles Python and PyTorch like that.

We also do not have Apple Silicon hardware to develop and test on. We would really love to support it properly — we genuinely like macOS. But the reality is that a refurbished laptop with an NVIDIA GPU can be found for a few hundred dollars from any online fleamarket, you install Linux on it, and picoDiffusion runs with no compromises. That is literally how this project was built.

If you do have a Mac and want to try running natively, the code itself will work — it is standard Python. You would need to install Python 3.12, pip install the dependencies, and run Uvicorn directly. We have an at-your-own-risk native install guide that covers this — it is not officially supported, but it should get you going.

Can I run this on Windows?

Not currently. Docker on Windows has the same problem as Docker on Mac — it runs a Linux VM, and GPU passthrough is complicated. WSL2 with CUDA support is a thing that exists, but making it work reliably with Docker and NVIDIA is a whole adventure that varies by Windows version, driver version, and WSL configuration. We do not have the resources to support and test that.

We also do not have Windows hardware to develop on. picoDiffusion was built on a $350 refurbished laptop from an online fleamarket that came with Windows 11. We promptly installed Linux Mint on it because, well, it was Windows 11 on an old laptop. You know how well that works out.

Linux is free, the NVIDIA drivers work well, and Docker just works. If you have a Windows machine with an NVIDIA GPU, installing Linux alongside it (dual boot) or replacing Windows entirely is the most reliable path. Your laptop will probably thank you.

If you are on Windows and just want to generate AI images without all of this, Windows has Copilot built in — you could try asking it.

Can I run this without a GPU at all?

Technically yes, but practically no. CPU inference is extremely slow — a single 512x512 image can take 5 to 15 minutes. During that time, every CPU core will be running at 100%. Your fans will spin up until your computer sounds like a sick Learjet. Other applications will be sluggish. Your laptop may generate enough hot air to achieve liftoff. On the bright side, you will not need heating. It works, but it is not what anyone would call fun.

A cheap used NVIDIA GPU with 4GB VRAM will transform the experience. The same image takes under a minute, your CPU stays cool, and you can still use your computer for other things while it generates.

I got an "out of memory" error.

Your image size is too large for your GPU's VRAM. Lower the width and height sliders and try again. 512x512 works on most 4GB cards. If even that fails, make sure no other applications are using your GPU (games, video editing, another AI tool, etc.).

Models and Files

What is a checkpoint?

A checkpoint is the main AI model file. It was trained on at least 6 images, probably a lot more, each with text descriptions of what is in the image. That is how the model learned what "corgi" or "sunset" or "warm lighting" looks like. Different checkpoints are trained on different massive collections of images, so they each produce different styles. See the guide for more details.

Where do I get checkpoints?

civitai.com is the most popular source. You can browse by style, see example images, and download for free. Make sure you download checkpoints built for sd1.5 (or 1.4). SDXL and SD 2.x checkpoints will not work with picoDiffusion.

What file format should I download?

.safetensors is the recommended format. It is the same data as .ckpt but in a safer container that cannot execute arbitrary code. If both are available, always choose .safetensors.

I added a new model file but it does not show up.

Refresh the page in your browser. The model list is fetched from the server every time the page loads, so new files should appear after a refresh.

If it still does not show up, try restarting the container (make down then make up).

If it still does not show up after that, check that the file is in the right place. Checkpoints go in cache/models/checkpoints/, VAEs go in cache/models/vae/, and LoRAs go in cache/models/loras/. The file should be a .safetensors file.

Beyond that, we are not sure. We did not include a file uploader because you would be uploading to your own machine and that seemed odd. But apparently you need one. Because of you. You did this. It is on the roadmap now. 😮‍💨

What is a LoRA and do I need one?

A LoRA is a small add-on file that tweaks the checkpoint's style or teaches it a new concept. They are completely optional. You can generate great images without one. If you want to explore specific styles (like vintage film looks or specific art styles), LoRAs are a fun way to do that. See the guide for details.

My LoRA does not seem to do anything.

There are several reasons this can happen:

Missing trigger word. Many LoRAs require a specific word or phrase in your prompt to activate. Check the LoRA's download page for its trigger word.
Wrong checkpoint. LoRAs are trained on a specific checkpoint. Their weights are tuned for that checkpoint's internal structure. If you use a LoRA with a different checkpoint, the effect might be subtle, weird, or completely absent. Check which checkpoint the LoRA was designed for.
Weight too low. Try increasing the LoRA weight slider. Start at 0.8 and go up from there. Some LoRAs need to be pushed to 1.2 or higher before they are noticeable.
Prompt is fighting the LoRA. Your prompt matters a lot. If the LoRA is supposed to give you a vintage film look but your prompt says "modern digital photography, clean, sharp", you are pulling in opposite directions. Try leaning your prompt towards the kind of output you expect from the LoRA — this helps a lot, above and beyond just the trigger words.
Compatibility. picoDiffusion deliberately simplifies things compared to more advanced tools. Not every LoRA will work perfectly. Some LoRAs are built for specific workflows or tools and may not produce the same results here. If a LoRA is not doing what you expect, try a different one before assuming something is broken.

Prompts and Results

My images do not look like the examples I see online.

There are a few things going on here, and we want to be honest about them.

The checkpoint matters most. Different checkpoints produce very different results from the same prompt. If you are trying to reproduce something you saw online, check which checkpoint was used and try to use the same one.

Prompt weighting. Many online examples use syntax like (word:1.4) or ((word)) to emphasise parts of the prompt. picoDiffusion does not currently support prompt weighting, so those syntax elements have no effect. The words themselves still work, just at equal weight.

This is a simplified tool. More advanced Stable Diffusion tools have features like upscalers, hires fix (a two-pass generation that adds detail), inpainting, ControlNet, regional prompting, and many other things that can dramatically improve results. picoDiffusion does not have any of those. It uses widely supported, stable versions of the underlying libraries and focuses on the core generation workflow.

The images you see on civitai and other galleries are often the result of someone using all of those advanced features, carefully tuned settings, and a lot of iteration. Your images from picoDiffusion are what comes straight out of the model with no post-processing — and for learning what the controls do and how the models work, that is exactly what you want. The fancy stuff comes later, if you want it.

Why do hands and fingers look weird?

This is a well-known limitation of Stable Diffusion 1.5. The model struggles with hands, fingers, and sometimes faces. Adding terms like "bad hands", "extra fingers", "deformed hands" to your negative prompt can help, but it will not fix the issue entirely. It is just something sd1.5 does.

Can I reproduce an image?

Yes, kinda, maybe. If you use the same seed, prompt, negative prompt, checkpoint, VAE, LoRA, sampler, CFG, steps, width, and height, you will get an image that is visually very, very close. Not necessarily pixel-identical — floating point math and GPU differences can introduce tiny variations — but close enough that you would not notice the difference.

All of these settings are embedded in the PNG file when you save an image, so you can always look them up later with any EXIF viewer.

What are those quality tags like "highly detailed" and "4k"?

They are suggestions, not magic switches. Some checkpoints respond to them, others mostly ignore them. They work because the model saw images tagged with those words during training. Try them and see if they help with your checkpoint — if they do not make a noticeable difference, leave them out.

Design Decisions

Why do you use an older version of PyTorch / CUDA?

The Dockerfile pins to CUDA 12.6 (cu126), which is the last version that supports Pascal-era GPUs (like the Quadro P2000 and GTX 1050 Ti). Newer versions of PyTorch drop support for these older cards.

We made this choice deliberately. picoDiffusion is meant to be approachable — something you can run on the hardware you already have, even if that hardware is old and cheap. A newer PyTorch might be faster on modern GPUs, but it would lock out the people this project is built for.

This is not a tool for mass-producing motion movies or running the latest 100GB SDXL models. It is a "my first Stable Diffusion" tool. It is designed to help you learn what all the controls do, experiment with prompts and styles, and have fun generating images without needing expensive hardware.

Why only sd1.5? What about SDXL or newer models?

sd1.5 checkpoints are small (2-4GB), run well on 4GB GPUs, and have a massive library of community models, LoRAs, and VAEs. SDXL models are much larger (6-7GB+), need significantly more VRAM, and are slower to generate.

For learning and experimenting, sd1.5 is the sweet spot. It is fast enough to iterate quickly and forgiving enough to run on modest hardware.

I have outgrown this. What should I use next?

If you want more power, more models, more control, and a node-based workflow system, take a look at ComfyUI. It supports SDXL, Flux, and many other model architectures. It has a steeper learning curve, but now that you understand what checkpoints, LoRAs, samplers, and CFG do, you will be in a much better position to use it.

That was kind of the point of picoDiffusion — to give you the foundation so the more powerful tools make sense when you get there. And honestly? If you have reached this point, we are thrilled. You used it, you learned, and now it is not enough. That is exactly what we wanted to accomplish. Thank you.

Docker and Setup

Why is this not on Docker Hub?

We would like it to be. A pre-built image you could just docker pull and run would be ideal. But Docker Hub has its own requirements, and the image includes PyTorch and CUDA libraries that have their own licences and distribution terms. We are honestly not sure if we can redistribute a container with all of that bundled in. We do not have a legal team to ask.

Building locally takes a few minutes the first time and Docker caches it after that. It is not as convenient as a pull, but it works and we know we are not breaking any rules.

If you know more about this than we do, we would love to hear from you. Open an issue.

The first build is really slow.

That is normal. The first build downloads PyTorch (~2GB) and all the other Python dependencies from their respective repositories. How long this takes depends on your internet speed — we cannot bundle these libraries ourselves for licensing reasons. On a decent connection, expect 5 to 10 minutes. On a slower connection, maybe a bit longer.

This is a good time to give some attention to your corgi. Or have a cup of tea. Or both. 5 to 10 minutes is a pretty low barrier to entry for your own AI image generator, and you only need to do it once. Docker caches everything, so the second build (and every build after that) is fast. Only changes to the requirements or Dockerfile trigger a full rebuild.

The first image generation is slow but after that it is fast.

Two things happen on the first generation: the checkpoint is loaded into GPU memory, and some small config files are downloaded from HuggingFace. Both are cached — the checkpoint stays in memory until you switch to a different one, and the HuggingFace files are saved on your computer permanently.

Can I run this on a server or expose it to the internet?

No. picoDiffusion has no user authentication, no access controls, and a completely open API. It is designed for personal, local use on your own computer. Under no circumstances should it be exposed to the internet or used to host image generation for others.

But I want my friends to use it too!

We get it. Making pictures with AI is fun and you want to share that. But picoDiffusion has no login system, no way to limit who can use it, and no protection against someone hammering the API and tying up your GPU for hours. Putting it on the internet would be like leaving your front door open with a sign that says "free GPU time."

It is also single-threaded — it makes one goofy picture at a time. If two people hit Generate at the same moment, one of them waits. It is not built for that.

Instead, consider having a LAN party. Seriously. Get your friends on the same network, order some pizza, and everyone can open http://your-ip:8004 in their browser and take turns making silly cat pictures together. That is local network use and it is totally fine — just do not put it on the internet. Also, we know what kind of images you are making. Try to control yourself.

If you want something that is actually built for multiple users with proper authentication and queuing, can we interest you in ComfyUI?

Do I need to restart the container when I add new model files?

No. Just refresh the page in your browser. Checkpoints, VAEs, and LoRAs are scanned from the directories every time the model list is requested.

How do I update to a newer version?

Pull the latest code and rebuild:

git pull
make up

Your model files and HuggingFace cache are stored outside the container, so nothing is lost during a rebuild.

The Important Questions

Why is it called picoDiffusion?

"Pico" means really, really small. Like, a trillionth of something. That felt right for a Stable Diffusion tool that is deliberately tiny, runs on old hardware, and does not try to be everything to everyone. Also, all the good names were taken.

Why is the default prompt a corgi?

Because corgis are perfect. Next question. (Who would generate anything other than pictures of corgis anyway? Degenerates, the whole lot of them.)

Can I make NSFW images?

I mean, you can. I guess. There is no content filter. It runs on your computer, it is your business. We are not your mum. And yes, that is probably what most of you degenerates are going to do first.

But did you know there are checkpoints designed for architectural visualisation? Models you can use to design spaceships? LoRAs that give everyone absurdly tall hats? There is a whole world of creative, weird, wonderful things you can make that do not involve... that. Look, something shiny!

The checkpoint determines what the model can generate. Some checkpoints are trained on SFW data only and will produce weird blurry messes if you push them somewhere they were not trained for. Others are not. The checkpoint's page on civitai usually makes it pretty clear which is which.

Is this going to mine bitcoin on my computer?

No. It is open source — you can read every line of code on GitHub. It does exactly one thing: turn text into pictures. Badly, sometimes, but honestly.

Why does my image have weird text or letters in it?

sd1.5 cannot do text. It saw text in training images and learned that images sometimes have letter-shaped things in them, but it has no idea what letters are or how spelling works. If you are getting random gibberish text in your images, add "text, words, letters, watermark" to your negative prompt. It helps, but it will not fix it completely. This is just a thing sd1.5 does.

Can I use multiple LoRAs at the same time?

Not right now. picoDiffusion supports one LoRA per generation. Could we add multi-LoRA support? Yes, technically. And maybe we will some day. But stacking LoRAs introduces a lot of complexity — the order you load them matters, the weights interact with each other, and troubleshooting why something looks wrong gets much harder. For a tool that is all about learning the basics with the fewest barriers, one LoRA at a time keeps things understandable.

If you have reached the point where you need to stack multiple LoRAs, that is a great sign — it means you have learned what they do and you want more control. Thank you for learning on our little program. That makes me and my corgi the happiest. Can we interest you in ComfyUI?

Can I train my own checkpoint or LoRA with this?

No. picoDiffusion is for generating images (inference), not for training models. Training requires significantly more VRAM, different software, and a lot of patience. If you want to train your own LoRA, look into kohya-ss. But maybe generate a few thousand pictures first and make sure you actually enjoy this before going down that rabbit hole.

How much disk space do I need?

The Docker image itself is around 8-10GB (mostly PyTorch). Each checkpoint is 2-4GB. VAEs are around 300-800MB. LoRAs are tiny, usually 10-150MB. So realistically, budget about 15-20GB for the tool plus a couple of checkpoints. You will probably end up downloading more checkpoints than you planned because "just one more" is a real thing.

Why 24 steps as the default?

Because even numbers are nice. Also, 24 steps hits the sweet spot of quality versus speed for most checkpoints. 25 would also work. We just like even numbers. Have you seen how unbalanced 25 is visually? We got tired of looking at it.

I generated something amazing and I want to show people.

That is not a question, but we are happy for you. Click "save image" under the picture to download it. The file name includes the seed number so you can find it later. All the settings (prompt, checkpoint, sampler, everything) are embedded in the PNG metadata, so you can always reproduce it or share the exact recipe with someone.

I generated something horrifying.

That is also not a question. Welcome to Stable Diffusion. The hands will haunt your dreams. It gets better. Sort of.