The uv package manager is an incredibly fast, modern tool for managing Python projects. While it makes dependency management a breeze, using it for machine learning tasks that require GPU acceleration, like running Hugging Face Diffusers, involves a few specific configuration steps.
Here’s a complete walkthrough of how to set up a project, handle CUDA dependencies, and optimize a script for performance.
Step 1: Define Your Project with pyproject.toml
First, define your project and its dependencies in a pyproject.toml file. This centralizes all your project’s metadata.
[project]
name = "hf-diffusers-test"
version = "0.1.0"
description = "A project to test Hugging Face Diffusers with uv."
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"torch",
"diffusers",
"transformers",
"accelerate",
"xformers"
]Step 2: Configure uv for CUDA
This is the most important step. The versions of PyTorch that support CUDA are hosted on a separate package index, not the standard PyPI, so you need to tell uv where to look for them.
Add the following section to your pyproject.toml.
# Add this section to configure uv's package sources
[tool.uv.pip]
# This points to the index for PyTorch with CUDA 12.1 support.
extra-index-url = ["https://download.pytorch.org/whl/cu121"]Note: The URL must be inside a list (
[...]). Providing it as a plain string will cause a parsing error.
Step 3: Create and Sync Your Environment
While uv can sometimes create an environment automatically, the most reliable workflow is an explicit two-step process:
- Create the virtual environment: This command creates an isolated
.venvfolder in your project directory.
uv venv- Sync the dependencies: This command reads your
pyproject.toml, finds the.venv, and installs all the packages into it, using the extra index for the CUDA-enabled PyTorch.
uv pip sync pyproject.tomlIf you already have an existing environment then you can skip step 1.
Step 4: Authenticate with Hugging Face
To download models like Stable Diffusion, you need to log in to your Hugging Face account from the command line.
- Get a Token: Go to your Hugging Face settings and generate a new access token with “read” permissions.
- Log In: Run the login command using
uv run. It will prompt you to paste your token.
uv run huggingface-cli loginThis command stores your token globally in your user’s home directory (~/.cache/huggingface/token), so you only need to do this once per machine.
Troubleshooting Tip: If you have an old token set as an
HF_TOKENenvironment variable, it will override the new one and cause an authentication error. You can fix this for your current session by runningunset HF_TOKEN.
Step 5: Create and Run the Generation Script
With the environment set up, you can now write your Python script. The key to fast subsequent generations is to load the model once and then enter a loop to generate images.
Here is a final, optimized script that does exactly that:
import torch
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import time
import random
# --- 1. SETUP (RUNS ONCE) ---
print("Loading model and applying optimizations... This will take a moment.")
model_id = "sd-legacy/stable-diffusion-v1-5"
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
scheduler=scheduler,
torch_dtype=torch.float16,
use_safetensors=True
)
pipe = pipe.to("cuda")
# Apply compatible optimizations for speed and memory
pipe.enable_vae_slicing()
pipe.enable_attention_slicing()
print("✅ Model loaded. Ready for prompts.")
print("Type your prompt and press Enter. Type 'exit' to quit.")
# --- 2. GENERATION LOOP (RUNS REPEATEDLY) ---
while True:
prompt = input("\nprompt> ")
if prompt.lower() == 'exit':
print("Exiting.")
break
# Generation settings
negative_prompt = "text, watermark, blurry"
num_steps = 20
guidance = 8.0
seed = random.randint(0, 2**32 - 1)
generator = torch.Generator(device="cuda").manual_seed(seed)
print(f"Generating with seed: {seed}")
start_time = time.monotonic()
# The model is already in memory, so this part is fast
with torch.inference_mode():
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=num_steps,
guidance_scale=guidance,
generator=generator
).images[0]
end_time = time.monotonic()
print(f"Image generated in {end_time - start_time:.2f} seconds.")
filename = f"image_{int(time.time())}.png"
image.save(filename)
print(f"Image saved to {filename}")To run your project, simply execute uv run main.py. The script will perform the slow, one-time model load and then repeatedly ask for prompts, generating each new image quickly because the model is already in GPU memory. The model is only unloaded when the script terminates.
However, if you just want a Python script to run a single image generation and exit, then you can use this script:
import torch
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import time
# --- Settings ---
model_id = "sd-legacy/stable-diffusion-v1-5"
prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle"
negative_prompt = "text, watermark"
num_steps = 20
guidance = 8.0
seed = 229314574930376
# --- 1. Scheduler and Pipeline Setup ---
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
scheduler=scheduler,
torch_dtype=torch.float16,
use_safetensors=True
)
pipe = pipe.to("cuda")
# --- 2. Advanced Optimizations (that are compatible) ---
print("Applying compatible optimizations...")
# For SPEED: Use VAE Slicing for memory-efficient decoding
pipe.enable_vae_slicing()
# For SPEED: Use PyTorch 2.0's built-in memory-efficient attention
pipe.enable_attention_slicing()
# --- 3. Generation ---
generator = torch.Generator(device="cuda").manual_seed(seed)
print("Starting timed generation...")
start_time = time.monotonic()
with torch.inference_mode():
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=num_steps,
guidance_scale=guidance,
generator=generator
).images[0]
end_time = time.monotonic()
print(f"Image generated in {end_time - start_time:.2f} seconds.")
# Save the final image
filename = "image.png"
image.save(f"{filename}")
print(f"Image saved to {filename}")I ran both of these scripts on my local NVIDIA Quadro P400 quite comfortably.
Here is the output of the second (run-once) script:
$: uv run main.py
Loading pipeline components...: 100%|███████████████████████████████████████████████████████| 7/7 [00:11<00:00, 1.69s/it]
Applying compatible optimizations...
Starting timed generation...
100%|█████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00, 1.27it/s]
Image generated in 17.67 seconds.
Image saved to image.pngAnd here is the generated image.

Performance
Whilst I went to some effort to optimise the image generations time I couldn’t quite match ComfyUI inference times. Even with the ‘optimised’ script it was at least 5 seconds slower than a comparable ComfyUI workflow. I think I’ll need to dig into Hugging Face Diffusers library to find out how to optimise further. I might do another post on this at some point.
Quick Summary: The Critical CUDA Setup
For those who might struggle, getting the GPU to work with uv boils down to one essential configuration.
The core problem is that CUDA-enabled PyTorch isn’t on the standard PyPI. The solution is to tell uv where to find it by adding the following to your pyproject.toml file:
[tool.uv.pip]
extra-index-url = ["https://download.pytorch.org/whl/cu121"]After adding that, create and sync your environment with these two commands:
uv venv
uv pip sync pyproject.tomlThis correctly configures your project to download and install the GPU-compatible CUDA libraries, which makes everything else possible!