Practical Deep Learning - Lesson 1

In this series of posts I’ll be covering the the fast.ai course Practical Deep Learning for Coder by Jeremy Howard. This is basically a review of my notes compiled as I progress throughout the course. It’s more of a personal reference but hopefully others might find these posts useful, especially for folks who are actively taking the fastai course.

In this first lesson we learn how to build a simple image classifier via the fastai API, which is built on PyTorch. It serves as a demonstration of how easily anyone can train existing models very quickly and easily using modern software frameworks such as fastai.

Download Sample Images

Before getting into the details of the image classifier code we need some data. Let’s start by understanding how to obtain and process the data we will feed into the model during the training process. First, we will download a couple of sample images using the DuckDuckGo API. We’ll define a simple function to access the API.

def search_images(keywords, max_images=200):
    return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')

Now we can download and display two images, of a bird and forest.

urls = search_images('bird photos', max_images=1)
urls[0]

'https://images.pexels.com/photos/1661179/pexels-photo-1661179.jpeg?cs=srgb&dl=pexels-roshan-kamath-1661179.jpg&fm=jpg'

dest = 'bird.jpg'
download_url(urls[0], dest, show_progress=False)

im = Image.open(dest)
im.to_thumb(256,256)

download_url(search_images('forest photos', max_images=1)[0], 'forest.jpg', show_progress=False)
Image.open('forest.jpg').to_thumb(256,256)

Download Images to Create a Dataset

To train a model we will need multiple images. It’s a similar process to download a series of images, that we can use as a dataset for our bird classifier. Take a look at the following code.

searches = 'forest','bird'
path = Path('bird_or_not')

if not path.exists():
    for o in searches:
        dest = (path/o)
        dest.mkdir(exist_ok=True, parents=True)
        download_images(dest, urls=search_images(f'{o} photo')[:200])
        time.sleep(5)
        resize_images(dest, max_size=400, dest=dest)

This creates a bird_or_not folder, with two sub-folders bird and forest. We then perform a web search for the terms ‘bird photo’ and ‘forest photo’, and download a maximum of 200 images for each term into the relevant folders (bird_or_not/bird, bird_or_not/forest).

Let’s see how many images were actually downloaded.

image_extensions = ['.jpg', '.jpeg', 'png']
fldrs = ['bird', 'forest']

def count_images(folders):
    for folder in folders:
        folder_path = path/folder
        num_images = len([f for f in folder_path.iterdir() if f.suffix.lower() in image_extensions])
        print(f"{folder}: {num_images} images")

count_images(fldrs)

bird: 176 images
forest: 185 images

Downloaded Image Cleanup

As an aside this section analyses the downloaded files and their extensions to see what image types were found. These files are then cleaned to remove any corrupt images, and those that do not match either .jpg or .jpeg extensions.

Get a list of all downloaded file extensions, not just .jpg, .jpeg, and .png.

all_suffixes = set()

def get_extensions(folders):
    for folder in folders:
        folder_path = path / folder
        image_files = [f for f in folder_path.iterdir() if f.is_file()]
        
        # Add suffixes to the set
        for f in image_files:
            all_suffixes.add(f.suffix.lower())
        
        print(f"{folder}: {len(image_files)} image files")

get_extensions(fldrs)

bird: 190 image files
forest: 191 image files

print("\nAll suffixes found:", all_suffixes)


All suffixes found: {'.webp', '.jpg!d', '.png', '.jpeg', '.gif', '.jpg'}

Let’s remove any corrupted files.

failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

get_extensions(fldrs)

bird: 184 image files
forest: 185 image files

print("\nAll suffixes found:", all_suffixes)


All suffixes found: {'.webp', '.jpg!d', '.png', '.jpeg', '.gif', '.jpg'}

path

Path('bird_or_not')

Let’s remove all extensions that don’t match .jpg, .jpeg, and .png.

exts = ['.png', '.jpg!d', '.gif', '.webp']

def delete_files_by_extension(exts):
    extensions = [ext.lower() for ext in exts]

    # Find matching files before deletion
    files_to_delete = [f for f in path.rglob("*") if f.is_file() and f.suffix.lower() in extensions]
    print(f"Found {len(files_to_delete)} files with specified extensions before deletion.")

    # Delete files
    for file in files_to_delete:
        file.unlink()

    # Confirm how many remain
    remaining = [f for f in path.rglob("*") if f.is_file() and f.suffix.lower() in extensions]
    print(f"{len(remaining)} matching files remain after deletion.")

delete_files_by_extension(exts)

Found 1 files with specified extensions before deletion.
0 matching files remain after deletion.

all_suffixes = set()
get_extensions(fldrs)

bird: 172 image files
forest: 179 image files

print("\nAll suffixes found:", all_suffixes)


All suffixes found: {'.jpeg', '.jpg'}

So we end up with the following number of cleaned bird and forest images that we will use for training our model.

count_images(fldrs)

bird: 172 images
forest: 179 images

Viewing the Dataset

To train a model with fastai, we first need to create a DataLoaders object, which manages both the training and validation datasets. The training set is used to teach the model, while the validation set helps evaluate how well the model performs on unseen data. Fastai provides a powerful abstraction called DataBlock to define how data should be loaded, labeled, and transformed. In the example below, we use DataBlock to create our dataloaders from a folder of images, applying resizing and splitting the data into training and validation sets.

dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)

blocks=(ImageBlock, CategoryBlock) - Specifies that each input is an image and the corresponding label is a category.

get_items=get_image_files - Tells fastai how to find the input items—in this case, all image files in a folder.

splitter=RandomSplitter(valid_pct=0.2, seed=42) - Randomly splits the dataset into 80% training and 20% validation, using a fixed seed for reproducibility.

get_y=parent_label - Extracts the label from the name of the parent folder (common in image classification datasets).

item_tfms=[Resize(192, method='squish')] - Applies a transformation to resize all images to 192x192 pixels using the “squish” method (which does not preserve aspect ratio).

Here is a sample preview of the dataset images using the show_batch function:

dls.show_batch(max_n=6)

Check GPU Availability

Before we train our model, let’s verify we have a GPU available locally.

torch.cuda.is_available()

True

default_device()

device(type='cuda', index=0)

More specific GPU details can be displayed about the GPU if one is detected.

if torch.cuda.is_available():
    device_id = torch.cuda.current_device()
    props = torch.cuda.get_device_properties(device_id)

    print(f"Device ID       : {device_id}")
    print(f"Device Name     : {props.name}")
    print(f"Total Memory    : {props.total_memory / 1e9:.2f} GB")
    print(f"Compute Capability : {props.major}.{props.minor}")
    print(f"Multiprocessors : {props.multi_processor_count}")
    print(f"CUDA Capability : {torch.version.cuda}")
    print(f"Device          : {torch.cuda.get_device_name(device_id)}")
else:
    print("No GPU available.")

Device ID       : 0
Device Name     : Quadro P4000
Total Memory    : 8.59 GB
Compute Capability : 6.1
Multiprocessors : 14
CUDA Capability : 12.4
Device          : Quadro P4000

On my desktop machine I have an NVIDIA Quadro P4000 graphics card installed which is adequate enough for this training run.

Training the Model

We can now train (fine-tune) a model using the fastai API. We can use any readily available model but the ResNet18 CNN model is fine for now. You can view details of this and other Resnet models here.

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

epoch	train_loss	valid_loss	error_rate	time
0	0.743350	0.310005	0.100000	00:02

epoch	train_loss	valid_loss	time
0	0.101221	0.006839	00:01
1	0.055796	0.006226	00:01
2	0.042845	0.005840	00:01

The model trained very quickly and achieved a high accuracy. Training ran for 3 epochs which is three complete passes through all the training data.

Using the Model for Predictions

We can now use our fine-tuned model to make predictions on whether an image is a bird or not.

Single Image Prediction

is_bird,_,probs = learn.predict(PILImage.create('bird.jpg'))

im = Image.open('bird.jpg')
im.to_thumb(256,256)

print(f"This is a: {is_bird}.")
print(f"Probability it's a bird: {probs[0]:.4f}")

This is a: bird.
Probability it's a bird: 1.0000

Validation Dataset Predictions

We can also show more predictions based on our validation dataset. This is very important as we need to know if our trained model is good at making predictions on data it has not seen before.

# Get predictions and targets from the validation set
preds, targs = learn.get_preds(ds_idx=1)
pred_labels = preds.argmax(dim=1)

len(preds)

We have 70 predictions which is around the correct number of images expected in the validation dataset.

count_images(fldrs)

bird: 172 images
forest: 179 images

print(f"20% of images in the dataset: {(178+180)*.2:.0f}")

20% of images in the dataset: 72

Or we can simple inspect the split dataset size directly from the dataloaders.

print(f"Training set size: {len(dls.train_ds)}")
print(f"Validation set size: {len(dls.valid_ds)}")

Training set size: 281
Validation set size: 70

We can display a sample of the validation dataset with predicted label and true labels. As you can see below, the fine-tuned model predicts all images with the correct labels for data it didn’t see during training.

# Access the actual validation dataset (items + transforms)
valid_ds = dls.valid_ds

n = 6
fig, axs = plt.subplots(1, n, figsize=(15,5))

for i in range(n):
    img, true_lbl = valid_ds[i]
    pred_lbl = dls.vocab[pred_labels[i]]
    actual_lbl = dls.vocab[true_lbl]

    img.show(ctx=axs[i], title=f"P:{pred_lbl}\nT:{actual_lbl}")

plt.tight_layout()

There is an easier way though to display predictions from the validation set using the show_results() fastai function.

learn.show_results()

Here we see that all validation sample predictions result in the correct data label.

We will continue to work with image classifiers in the next lesson.

Image Classification of Paintings

Before we move onto the next lesson I just wanted to do my own example of an image classifier, to check if I understodd all the steps, that detects if an image of a painting was painted in a certain style. This time I will add three categories to see how the model copes with an additional category of input data. The styles of painting we’ll focus on are: Impressionism, Cubism, and Surrealism.

Download Painting Images

Let’s generalize the code to search and download images into a reusable function.

def generate_image_dataset(pth, search_terms, max_images=200):
    if not path.exists():
        for o in search_terms:
            dest = (pth/o)
            dest.mkdir(exist_ok=True, parents=True)
            download_images(dest, urls=search_images(f'{o} photo')[:max_images])
            time.sleep(5)
            resize_images(dest, max_size=400, dest=dest)

searches = ['impressionism', 'cubism', 'surrealism']
path = Path('paintings')
generate_image_dataset(path, searches)

Let’s see how many images were downloaded and the file types.

all_suffixes = set()
fldrs = ['impressionism', 'cubism', 'surrealism']
get_extensions(fldrs)
print("\nAll suffixes found:", all_suffixes)

impressionism: 190 image files
cubism: 191 image files
surrealism: 191 image files

All suffixes found: {'.webp', '.jpg', '.jpeg', '.png'}

Let’s remove any corrupted files as we did before.

failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

delete_files_by_extension(exts)

So after cleaning we have the following images counts for our three types of paintings.

count_images(fldrs)

impressionism: 174 images
cubism: 174 images
surrealism: 181 images

Create Dataloaders

Next we need to create the dataloaders object that feeds in image data to our model. This is exaclty the same as before but our path now points to a directory containing three folders (one for each type of painting).

dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)

And this is a sample of our paintings dataset.

dls.show_batch(max_n=12)

Training the Model

We can now train the model just as before and check what the error rate looks like.

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

epoch	train_loss	valid_loss	error_rate	time
0	1.244499	0.925823	0.238095	00:02

epoch	train_loss	valid_loss	error_rate	time
0	0.473010	0.469163	0.161905	00:02
1	0.309580	0.421583	0.142857	00:02
2	0.195176	0.416563	0.142857	00:02

This isn’t very good at all. Let’s try for a few more epochs.

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(9)

epoch	train_loss	valid_loss	error_rate	time
0	1.376832	0.975764	0.238095	00:02

epoch	train_loss	valid_loss	error_rate	time
0	0.427012	0.462371	0.114286	00:02
1	0.295496	0.281834	0.104762	00:02
2	0.195263	0.318176	0.114286	00:02
3	0.141423	0.455745	0.161905	00:02
4	0.104884	0.434516	0.161905	00:02
5	0.079488	0.444858	0.171429	00:00
6	0.061871	0.389154	0.161905	00:02
7	0.054440	0.380409	0.152381	00:02
8	0.044192	0.375198	0.152381	00:02

Even after 9 epochs the model seems to be struggling to learn about the three distinct types of painting.

Let’s try adding stronger augmentations to the data so during training it varies the input data (types of paintings) more dynamically by altering image attributes such as random rotations, zoom, brightness, contrast, warping, and flipping. This is designed to simulate the natural variability in real-world data. By exposing the model to a wider range of visual variations during training, stronger augmentations help prevent overfitting and make the model more robust to new, unseen images.

This will hopefully lead to an improvement in the model’s ability to generalize.

item_tfms = Resize(224, method='squish')
batch_tfms = aug_transforms(mult=2.0, max_rotate=20, max_lighting=0.4, max_zoom=1.2)

dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=item_tfms,
    batch_tfms=batch_tfms
).dataloaders(path, bs=32)

While we’re at it we can use a larger Resnet model and seek out an optimial learning rate (we will cover this in more detail in future lessons).

learn = vision_learner(dls, resnet50, metrics=error_rate)
learn.lr_find()

SuggestedLRs(valley=0.0012022644514217973)

Using the updated aug_transforms, learning rate, and larger model, let’s see if this makes a difference.

learn.fine_tune(9, base_lr=1e-3)

epoch	train_loss	valid_loss	error_rate	time
0	1.267471	0.366949	0.133333	00:03

epoch	train_loss	valid_loss	error_rate	time
0	0.656466	0.319643	0.114286	00:06
1	0.568277	0.303510	0.133333	00:07
2	0.491390	0.268793	0.085714	00:07
3	0.430013	0.237580	0.066667	00:05
4	0.355102	0.207443	0.057143	00:06
5	0.317612	0.193521	0.057143	00:07
6	0.284293	0.184749	0.057143	00:07
7	0.261323	0.202329	0.047619	00:07
8	0.246992	0.195494	0.047619	00:05

The error rate is at around 5% which is much better than before, and the model seems to be converging rather than bouncing around as before.

Let’s see the results.

learn.show_results()

Even though it is not perfect we can see that most of the predictions are correct! This is impressive as I was not sure that the model would be able to pick up on the difference between painting types given the limited input data and number of epochs.

Jeremy was right! You don’t necessarily need a huge dataset and long training times to get meaningful results.