Setting Up Hugging Face Diffusers with uv for CUDA Acceleration

The uv package manager is an incredibly fast, modern tool for managing Python projects. While it makes dependency management a breeze, using it for machine learning tasks that require GPU acceleration, like running Hugging Face Diffusers, involves a few specific configuration steps.

Here’s a complete walkthrough of how to set up a project, handle CUDA dependencies, and optimize a script for performance.

Step 1: Define Your Project with `pyproject.toml`

First, define your project and its dependencies in a pyproject.toml file. This centralizes all your project’s metadata.

[project]
name = "hf-diffusers-test"
version = "0.1.0"
description = "A project to test Hugging Face Diffusers with uv."
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "torch",
    "diffusers",
    "transformers",
    "accelerate",
    "xformers"
]

Step 2: Configure `uv` for CUDA

This is the most important step. The versions of PyTorch that support CUDA are hosted on a separate package index, not the standard PyPI, so you need to tell uv where to look for them.

Add the following section to your pyproject.toml.

# Add this section to configure uv's package sources
[tool.uv.pip]
# This points to the index for PyTorch with CUDA 12.1 support.
extra-index-url = ["https://download.pytorch.org/whl/cu121"]

Note: The URL must be inside a list ([...]). Providing it as a plain string will cause a parsing error.

Step 3: Create and Sync Your Environment

While uv can sometimes create an environment automatically, the most reliable workflow is an explicit two-step process:

Create the virtual environment: This command creates an isolated .venv folder in your project directory.

uv venv

Sync the dependencies: This command reads your pyproject.toml, finds the .venv, and installs all the packages into it, using the extra index for the CUDA-enabled PyTorch.

uv pip sync pyproject.toml

If you already have an existing environment then you can skip step 1.

Step 4: Authenticate with Hugging Face

To download models like Stable Diffusion, you need to log in to your Hugging Face account from the command line.

Get a Token: Go to your Hugging Face settings and generate a new access token with “read” permissions.
Log In: Run the login command using uv run. It will prompt you to paste your token.

uv run huggingface-cli login

This command stores your token globally in your user’s home directory (~/.cache/huggingface/token), so you only need to do this once per machine.

Troubleshooting Tip: If you have an old token set as an HF_TOKEN environment variable, it will override the new one and cause an authentication error. You can fix this for your current session by running unset HF_TOKEN.

Step 5: Create and Run the Generation Script

With the environment set up, you can now write your Python script. The key to fast subsequent generations is to load the model once and then enter a loop to generate images.

Here is a final, optimized script that does exactly that:

import torch
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import time
import random

# --- 1. SETUP (RUNS ONCE) ---
print("Loading model and applying optimizations... This will take a moment.")
model_id = "sd-legacy/stable-diffusion-v1-5"

scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    torch_dtype=torch.float16,
    use_safetensors=True
)
pipe = pipe.to("cuda")

# Apply compatible optimizations for speed and memory
pipe.enable_vae_slicing()
pipe.enable_attention_slicing()

print("✅ Model loaded. Ready for prompts.")
print("Type your prompt and press Enter. Type 'exit' to quit.")

# --- 2. GENERATION LOOP (RUNS REPEATEDLY) ---
while True:
    prompt = input("\nprompt> ")

    if prompt.lower() == 'exit':
        print("Exiting.")
        break

    # Generation settings
    negative_prompt = "text, watermark, blurry"
    num_steps = 20
    guidance = 8.0
    seed = random.randint(0, 2**32 - 1)
    generator = torch.Generator(device="cuda").manual_seed(seed)

    print(f"Generating with seed: {seed}")
    start_time = time.monotonic()

    # The model is already in memory, so this part is fast
    with torch.inference_mode():
        image = pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=num_steps,
            guidance_scale=guidance,
            generator=generator
        ).images[0]

    end_time = time.monotonic()
    print(f"Image generated in {end_time - start_time:.2f} seconds.")

    filename = f"image_{int(time.time())}.png"
    image.save(filename)
    print(f"Image saved to {filename}")

To run your project, simply execute uv run main.py. The script will perform the slow, one-time model load and then repeatedly ask for prompts, generating each new image quickly because the model is already in GPU memory. The model is only unloaded when the script terminates.

However, if you just want a Python script to run a single image generation and exit, then you can use this script:

import torch
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import time

# --- Settings ---
model_id = "sd-legacy/stable-diffusion-v1-5"
prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle"
negative_prompt = "text, watermark"
num_steps = 20
guidance = 8.0
seed = 229314574930376

# --- 1. Scheduler and Pipeline Setup ---
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    torch_dtype=torch.float16,
    use_safetensors=True
)

pipe = pipe.to("cuda")

# --- 2. Advanced Optimizations (that are compatible) ---
print("Applying compatible optimizations...")

# For SPEED: Use VAE Slicing for memory-efficient decoding
pipe.enable_vae_slicing()

# For SPEED: Use PyTorch 2.0's built-in memory-efficient attention
pipe.enable_attention_slicing()

# --- 3. Generation ---
generator = torch.Generator(device="cuda").manual_seed(seed)

print("Starting timed generation...")
start_time = time.monotonic()

with torch.inference_mode():
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=num_steps,
        guidance_scale=guidance,
        generator=generator
    ).images[0]

end_time = time.monotonic()
print(f"Image generated in {end_time - start_time:.2f} seconds.")

# Save the final image
filename = "image.png"
image.save(f"{filename}")
print(f"Image saved to {filename}")

I ran both of these scripts on my local NVIDIA Quadro P400 quite comfortably.

Here is the output of the second (run-once) script:

$: uv run main.py
Loading pipeline components...: 100%|███████████████████████████████████████████████████████| 7/7 [00:11<00:00,  1.69s/it]
Applying compatible optimizations...
Starting timed generation...
100%|█████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00,  1.27it/s]
Image generated in 17.67 seconds.
Image saved to image.png

And here is the generated image.

Performance

Whilst I went to some effort to optimise the image generations time I couldn’t quite match ComfyUI inference times. Even with the ‘optimised’ script it was at least 5 seconds slower than a comparable ComfyUI workflow. I think I’ll need to dig into Hugging Face Diffusers library to find out how to optimise further. I might do another post on this at some point.

Quick Summary: The Critical CUDA Setup

For those who might struggle, getting the GPU to work with uv boils down to one essential configuration.

The core problem is that CUDA-enabled PyTorch isn’t on the standard PyPI. The solution is to tell uv where to find it by adding the following to your pyproject.toml file:

[tool.uv.pip]
extra-index-url = ["https://download.pytorch.org/whl/cu121"]

After adding that, create and sync your environment with these two commands:

uv venv
uv pip sync pyproject.toml

This correctly configures your project to download and install the GPU-compatible CUDA libraries, which makes everything else possible!

Step 1: Define Your Project with pyproject.toml

Step 2: Configure uv for CUDA