URLs.MNIST_SAMPLE
'https://s3.amazonaws.com/fast-ai-sample/mnist_sample.tgz'
David Gwyer
May 29, 2025
This chapter is all about the building blocks of creating a successful model. A computer vision model in this case, that can recognise hand-written digits. We’ll cover the individual components that make up the overall model, and how they are used together to form a working system that can perform accurate inferencing.
We’ll be using the MNIST dataset in this lesson for making predictions about hand-written digits. The full dataset is around 60,000 training images, and 10,000 test images. However, for the purposes of this lesson to keep things simple we’ll use a sample of the full dataset to try and predict only the digits ‘3’ and ‘7’ (rather than the full range of digits ‘0’ through ‘10’.
Downloading the dataset is pretty straightforward using the untar_data()
function from fast.ai. Once the images are downloaded we can view some of the folders and filenames.
(#6265) [Path('train/7/7420.png'),Path('train/7/9878.png'),Path('train/7/47453.png'),Path('train/7/18966.png'),Path('train/7/27005.png'),Path('train/7/31957.png'),Path('train/7/14379.png'),Path('train/7/5811.png'),Path('train/7/33104.png'),Path('train/7/43686.png'),Path('train/7/58687.png'),Path('train/7/46356.png'),Path('train/7/4242.png'),Path('train/7/50455.png'),Path('train/7/54561.png'),Path('train/7/20105.png'),Path('train/7/2814.png'),Path('train/7/17185.png'),Path('train/7/38776.png'),Path('train/7/22313.png')...]
(#6131) [Path('train/3/47123.png'),Path('train/3/21559.png'),Path('train/3/17103.png'),Path('train/3/59660.png'),Path('train/3/59408.png'),Path('train/3/20738.png'),Path('train/3/8195.png'),Path('train/3/15109.png'),Path('train/3/54568.png'),Path('train/3/21075.png'),Path('train/3/20705.png'),Path('train/3/16811.png'),Path('train/3/43816.png'),Path('train/3/20869.png'),Path('train/3/42951.png'),Path('train/3/54020.png'),Path('train/3/48064.png'),Path('train/3/59996.png'),Path('train/3/44596.png'),Path('train/3/1476.png')...]
12396
As you can see, the MNIST_SAMPLE dataset only contains the digits ‘3’ and ‘7’, and the dataset has 12396 images in total. Let’s take a look at a sample hand-written ‘7’ image.
img_path = path/'train'/'7'/os.listdir(path/'train'/'7')[0]
img = PILImage.create(img_path)
img.show(figsize=(2,2));
Let’s take a look at a few more random sample ‘7’ digits.
[Path('train/7/17833.png'),
Path('train/7/24882.png'),
Path('train/7/56828.png'),
Path('train/7/56543.png'),
Path('train/7/10940.png'),
Path('train/7/23192.png'),
Path('train/7/19257.png'),
Path('train/7/30153.png'),
Path('train/7/10795.png')]
We can do the same for the ‘3’ hand-written digits.
digit_dir = path/'train'/'3'
sampled_files = random.sample(digit_dir.ls(), 9)
imgs = [PILImage.create(f) for f in sampled_files]
show_images(imgs, nrows=3, figsize=(5,5))
As you can see these are all clearly hand written ‘3’ and ‘7’ digits which are easily recognisable by humans but how well can we train a deep learning model to achieve the same task? We’ll get to this a little later on.
Let’s store all ‘3’ and ‘7’ hand-drawn images into variables for convenience.
Neural network models are interested in numbers only, so let’s take a closer look at the numerical structure of our dataset. We’ll take an image path for a ‘3’ digit and convert it into an image format via the Python Imaging Library (PIL).
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 29],
[ 0, 0, 0, 48, 166, 224],
[ 0, 93, 244, 249, 253, 187],
[ 0, 107, 253, 253, 230, 48],
[ 0, 3, 20, 20, 15, 0]], dtype=uint8)
Each image is made up of 28 x 28 pixels (784 in total), each numbered from 0-255, which defines a grayscale image. In the NumPy array above we’re only showing pixel values for the top left portion of the image. We can display the image data as a PyTorch tensor too.
tensor([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 29],
[ 0, 0, 0, 48, 166, 224],
[ 0, 93, 244, 249, 253, 187],
[ 0, 107, 253, 253, 230, 48],
[ 0, 3, 20, 20, 15, 0]], dtype=torch.uint8)
We can use a Pandas DataFrame to ‘color’ the grayscale values for a more intuitive visualization. In the plot below white pixels are stored as the number 0, black is the number 255, and shades of gray are between the two.
im3_t = tensor(im3)
df = pd.DataFrame(im3_t[4:15,4:22])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 29 | 150 | 195 | 254 | 255 | 254 | 176 | 193 | 150 | 96 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 48 | 166 | 224 | 253 | 253 | 234 | 196 | 253 | 253 | 253 | 253 | 233 | 0 | 0 | 0 |
3 | 0 | 93 | 244 | 249 | 253 | 187 | 46 | 10 | 8 | 4 | 10 | 194 | 253 | 253 | 233 | 0 | 0 | 0 |
4 | 0 | 107 | 253 | 253 | 230 | 48 | 0 | 0 | 0 | 0 | 0 | 192 | 253 | 253 | 156 | 0 | 0 | 0 |
5 | 0 | 3 | 20 | 20 | 15 | 0 | 0 | 0 | 0 | 0 | 43 | 224 | 253 | 245 | 74 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 249 | 253 | 245 | 126 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 101 | 223 | 253 | 248 | 124 | 0 | 0 | 0 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 0 | 11 | 166 | 239 | 253 | 253 | 253 | 187 | 30 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 | 0 | 16 | 248 | 250 | 253 | 253 | 253 | 253 | 232 | 213 | 111 | 2 | 0 | 0 |
10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 43 | 98 | 98 | 208 | 253 | 253 | 253 | 253 | 187 | 22 | 0 |
One approach to modelling the problem of digit classification is to take the average of each dataset (all the ‘3’ and ‘7’ images), and use them to determine how similar individual images are to the average ‘ideal’ images. Hopefully, this will lead to a simple but workable image classification of ‘3’ or ‘7’ digits.
First, we need to convert all the digits into PyTorch tensors and then stack all the indivisual ‘3’ and ‘7’ images and finally take the average of each. Remember, to convert a single image into a tensor we can do the following (just a slice is shown).
tensor([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 29],
[ 0, 0, 0, 48, 166, 224],
[ 0, 93, 244, 249, 253, 187],
[ 0, 107, 253, 253, 230, 48],
[ 0, 3, 20, 20, 15, 0]], dtype=torch.uint8)
The shape of this tensor is:
So we have just one three image digit in our tensor, but now let’s add all the other ‘3’ digits. We’ll do this by converting all the individual images from the ‘3’ dataset to a PyTorch tensor one at a time and then storing them all in a standard Python list.
Here we make use of a Python list comprehension to do the tensor conversion and list generation.
So our newly generated list of ‘3’ digits has the same number of entries as the image paths list, and we can confirm that the first entry is the same as before.
tensor([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 29],
[ 0, 0, 0, 48, 166, 224],
[ 0, 93, 244, 249, 253, 187],
[ 0, 107, 253, 253, 230, 48],
[ 0, 3, 20, 20, 15, 0]], dtype=torch.uint8)
Let’s do the same for the ‘7’ digits too before moving on.
We can display a ‘3’ and ‘7’ image from each of our generated lists to make sure they look okay. Note that since the images are now PyTorch tensors we need to use the show_image()
or show_images()
function, otherwise Jupyter will just output numerical values.
matplotlib.rc('image', cmap='Greys')
show_images([three_tensors[1], seven_tensors[1]], figsize=(3,3));
To complete the calculation of the ideal (average) ‘3’ and ‘7’ digit we need to stack all three tensors and all seven tensors into two new tensors, and then take the average of each one. We’ll use the PyTorch stack()
and mean()
functions for this.
So, to convert our list of ‘3’ and ‘7’ tensors into individual stacked tensors we can use the stack
function. While we’re at it, we’ll cast the pixel values from integers to floats (required when calculating means), and also normalize them to a number between 0 and 1. This is pretty standard practice when image data is in float format.
stacked_threes = torch.stack(three_tensors).float()/255
stacked_sevens = torch.stack(seven_tensors).float()/255
stacked_threes.shape, stacked_sevens.shape
(torch.Size([6131, 28, 28]), torch.Size([6265, 28, 28]))
You can think of each new tensor as a vertical stack of 28 x 28 digits all on top of one another. Using the mean()
function we can ‘collapse’ these into a single tensor by taking the mean of all the pixel values at each location in the image.
mean3 = stacked_threes.mean(0)
mean7 = stacked_sevens.mean(0)
show_images([mean3, mean7], figsize=(3,3));
The zero in stacked_threes.mean(0)
describes the dimension along which you wish to calculate the mean. As our stacked tensors were piled on top of one another we can use the first dimension.
The resulting images above represent what the ‘ideal’ image for a ‘3’ and ‘7’ looks like. They appear chunkier than the individual digits as the darker areas are where the pixels align and are common to most images. The blurrier areas are where the pixels are less consistent over all images in the dataset.
Starting with the threes, we can calculate the mean distance between each pixel in a random digit selected from our dataset and the ‘ideal’ digit.
dist_3_abs = (a_3 - mean3).abs().mean()
dist_3_sqr = ((a_3 - mean3)**2).mean().sqrt()
dist_3_abs,dist_3_sqr
(tensor(0.1114), tensor(0.2021))
Here we are using two variations of the mean to calculate the distance between our ‘3’ and the idea three: 1. Mean absolute difference or L1 norm 2. Root mean squared error (RMSE) or L2 norm
Both give us a sense of measure of the closeness between the selected digit and the ideal average. Let’s now compare the same digit with the ideal seven using both metrics.
dist_7_abs = (a_3 - mean7).abs().mean()
dist_7_sqr = ((a_3 - mean7)**2).mean().sqrt()
dist_7_abs,dist_7_sqr
(tensor(0.1586), tensor(0.3021))
We can use the calculated means to try and determine if the selected digit was a ‘3’ or a ‘7’. i.e. Was it closer to the ideal three, or seven? In both mean calculations the distance was closer to the ideal ‘3’ so we can ‘predict’ that the selected digit is in fact a three.
PyTorch provides (as you might expect!) ready functions to calculate the L1 and L2 norms, which produce the same result.
Note: Intuitively, the difference between L1 norm and mean squared error (MSE) is that the latter will penalize bigger mistakes more heavily than the former (and be more lenient with small mistakes).
Next we’ll look at using metrics to evaluate our predictions. We have already encountered two metrics, mean squared error, and mean absolute error. While these are both useful to make predictions the values in themselves are not that intuitive. So in practice we other metrics such as accuracy as it measures how often the model gets the correct label; which is often the most direct way to evaluate classification performance.
When evaluating metrics we always use data that the model was NOT trained on. In the pixel similarity model we don’t yet have any trained components but it is still a useful practice to do anyway. Let’s compile ‘stacked’ tensors of ‘3’ and ‘7’ digits as we did before but this time using the validation data.
valid_3_tens = torch.stack([tensor(Image.open(o)) for o in (path/'valid'/'3').ls()])
valid_7_tens = torch.stack([tensor(Image.open(o)) for o in (path/'valid'/'7').ls()])
valid_3_tens.shape, valid_7_tens.shape
(torch.Size([1010, 28, 28]), torch.Size([1028, 28, 28]))
To calculate the accuracy metric we’ll need to do the following: - Define a function to calculate the mean for a specific image (or stack of images) - Confirm it works for a single image (compare to previous value) - Define a function to predict if a digit is a ‘3’ or not - Calculate the accuaray of predicting a ‘3’ or ‘7’ and the overall accuracy for all digits in the validations dataset
That’s the first two tasks done. We have a general function now to calculate the distance between a sample digit and the ‘ideal’ digit, and confirmed it matches the value from before. Now we need another funtion to make the prediction about a digit.
And finally, let’s calculate the accuracies:
So we have a 94% accuracy for the ‘3’ digits, 98% accuracy(!) for the ‘7’ digits, and a 96% accuracy rating overll for the two digits combined. We’ll compare these accuracy results with stochastic gradient descent in the next section.
Stochastic Gradient Descent (SGD) can be defined as the following set of steps. Here, weights are just some initially random parameters that we want to optimise to somehow improve our model predictions:
This concatenates the two separate stacks of ‘3’ digits and ‘7’ digits into one larger stacked list. The shape of this new rank-3 is now 12396 x 28 x 28. That is, 12396 layers of 28 x 28 digits if you want to think of it that way.
However, we also want the image data to be a long vector of 28*28=784 values rather than a 28 x 28 matrix. The data doesn’t change at all, we’re simply ‘flattening’ the data to make it easier to feed as input to the neural network. We can use the PyTorch view
function to do the rank-3 to rank-2 conversion.
torch.Size([12396, 784])
We also need a label for each image. We’ll use 1 for 3s and 0 for 7s.
torch.Size([12396, 1])
Here, we create a concatenated tensor of 1s and 0s each of which correspond to the length of the 3s and 7s tensors respectively. unsqueeze()
is used to convert a rank-1 tensor of shape [12396]
, to a rank-2 tensor of shape [12396, 1]
. A Dataset in PyTorch is required to return a tuple of (x,y) when indexed. Python provides a zip function which, when combined with list, provides a simple way to get this functionality.
dset
stores all the image/label pairs in a 12396 list of tuples. Remember, the first tuple item is a 784 long rank-1 tensor representing the original 28 x 28 image in flattened form, and the second tuple item is the label associated with the image.
(tensor([0.8078, 0.9961, 0.9961, 0.9961, 0.9961]), tensor([1]))
We need to do the same for the validation set.
Now, let’s define a function to create the initial set of random weights for our model (one for every pixel).
def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()
weights = init_params((28*28,1))
weights.shape
torch.Size([784, 1])
A sample of our generated weights look like this for the first few pixels. Notice how the gradient is enabled for these weights.
tensor([[ 1.9269],
[ 1.4873],
[ 0.9007],
[-2.1055],
[ 0.6784],
[-1.2345],
[-0.0431],
[-1.6047],
[-0.7521],
[ 1.6487]], grad_fn=<SliceBackward0>)
We also need to generate the initial value for the bias. This is a single number (rank-0 tensor) as there is just one bias per image.
(torch.Size([784]), torch.Size([784, 1]), torch.Size([1, 784]))
We now have enough data defined that we can make a first prediction calculation. That is, multiply the first image in the training set by the random weights, sum them, and add the bias.
We basically want to do a dot product between the image data vector and the weights and add the bias value. But multiplying vectors in Pytorch using the *
operator is an element-wise operation only so we manually need to perform a summation too.
However, we can use the @
operator instead to do a full matrix multiplication (i.e. dot product in this case) to achieve the same result.
And because it’s just matrix multiplication we can calculate the dot product (i.e. initial predictions) for all images in the training dataset just as easily. Notice that the first calculated value is the same as the one calculated above as this represents the first image in the dataset.
tensor([[-12.6396],
[ -0.3468],
[ 9.8740],
...,
[-19.9456],
[ -1.6718],
[-23.8029]], grad_fn=<AddBackward0>)
And now we can determine how many of the predictions were correct. We’re treating every prediction above zero as a three, and seven if below zero, and if it matches the label the result is true.
(torch.Size([12396, 1]),
tensor([[False],
[False],
[ True],
...,
[ True],
[ True],
[ True]]))
And the overall accuracy is the number of correct predictions.
As expected this is not very good since we’re starting with a random set of weights. To improve we will need to use a loss function, calculate the gradients of the loss with respect to each weight, and use them the update the weights and hopefully improve the loss. We will put all this together into a complete training loop in the next section!
The training loop will consist of predictions, loss value, gradient calculations, and weight updates. This will be repeated until the loss reaches (converges) to an acceptable value. In order to calculate the loss we need a loss function.
The mnist_loss
function defines a simple custom loss for binary classification by first applying a sigmoid to the model’s raw outputs to convert them into probabilities (values between 0.0 and 1.0). We use torch.where
to compute the loss. For targets that are 1 (positive class), it returns 1 - prediction (penalizing underconfidence), and for targets that are 0 (negative class), it returns prediction (penalizing false positives). This creates a loss that encourages high probabilities for the correct class and low probabilities for the incorrect class. Finally, the loss is averaged over the batch.
Before moving on it’s useful to clarify about Dataset
and DataLoader
and how they work. For instance we can feed in a Python collection to a DataLoader
and it will return an iterator over mini-batches.
[tensor([ 8, 0, 13, 3, 2]),
tensor([14, 4, 6, 7, 9]),
tensor([ 5, 1, 10, 12, 11])]
This is very convenient, and powerful! However, for training a model, we require a collection containing independent and dependent variables (inputs and targets of the model). A collection that contains tuples of independent and dependent variables is known in PyTorch as a Dataset. Here’s an example of an extremely simple Dataset:
(#26) [(0, 'a'),(1, 'b'),(2, 'c'),(3, 'd'),(4, 'e'),(5, 'f'),(6, 'g'),(7, 'h'),(8, 'i'),(9, 'j'),(10, 'k'),(11, 'l'),(12, 'm'),(13, 'n'),(14, 'o'),(15, 'p'),(16, 'q'),(17, 'r'),(18, 's'),(19, 't')...]
When we pass a Dataset to a DataLoader we will get back mini-batches which are themselves tuples of tensors representing batches of independent and dependent variables. You can think of a DataLoader
as yielding batches of data.
[(tensor([25, 11, 4, 1, 7, 21]), ('z', 'l', 'e', 'b', 'h', 'v')),
(tensor([19, 0, 8, 13, 16, 23]), ('t', 'a', 'i', 'n', 'q', 'x')),
(tensor([ 3, 6, 12, 17, 18, 2]), ('d', 'g', 'm', 'r', 's', 'c')),
(tensor([14, 9, 10, 15, 22, 5]), ('o', 'j', 'k', 'p', 'w', 'f')),
(tensor([24, 20]), ('y', 'u'))]
Let’s start to work on the training loop now. First we’ll re-initialize our parameters.
If you remember from earlier we already created the training and validation datasets. These are lists of tuples, with each tuple containing a 784 vector of pixel values and a boolean label representing if the image is a ‘3’ or ‘7’. For example the first tuple in the training dataset dset
is (image data cropped):
(tensor([0.8588, 0.6510, 0.4627, 0.4627, 0.0235, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]),
tensor([1]))
We can now create the training and validation DataLoader
objects.
dl = DataLoader(dset, batch_size=256)
valid_dl = DataLoader(valid_dset, batch_size=256)
xb,yb = first(dl)
xb.shape,yb.shape
(torch.Size([256, 784]), torch.Size([256, 1]))
Next, we define functions to calculate the gradient, and train for one epoch. That is, cycle through all mini-batches that our DataLoader
yields.
We’ll also need functions to evaluate a single batch accuracy and to average this over all batches:
Let’s try this out and train for one eopch.
Let’s repeat for a few more epochs.
0.5031 0.6842 0.7503 0.9013 0.9457 0.956 0.9618 0.9647 0.9667 0.9681 0.9701 0.9706 0.972 0.9725 0.973 0.9735 0.974 0.9745 0.975 0.975
Not bad. Our accuracy is over 97% for a single epoch, which is already better than the pixel similarity approach!
So far we’ve created almost all training and validation code from scratch. While this is very important for our understanding PyTorch provides some useful classes to make it easier to implement. The first thing we’ll do is use the nn.Linear module which does the same thing as our init_params and linear together. It contains both the weights and biases in a single class.
We can see what parameters are available in our linear module.
We can use this to create a basic optimizer.
And thus our single epoch training loop can be simplified to:
We can try this out for multiple epochs using a simple loop inside a function.
0.4932 0.83 0.8598 0.9194 0.9394 0.9536 0.9629 0.9662 0.9682 0.9706 0.9726 0.9741 0.9745 0.9755 0.977 0.977 0.9775 0.9775 0.9779 0.9784
We can generalize this more by using the fastai SGD class which essentially does the same as the BasicOptim
class.
linear_model = nn.Linear(28*28,1)
opt = SGD(linear_model.parameters(), lr)
train_model(linear_model, 20)
0.4932 0.791 0.8696 0.9223 0.9419 0.9546 0.9633 0.9672 0.9692 0.9721 0.9726 0.9741 0.9736 0.9755 0.977 0.977 0.9779 0.9779 0.9779 0.9784
Another abstraction fastai provides is Learner.fit
, which we can use instead of train_model
. To create a ‘Learner’ we first need to create a DataLoaders
object, by passing in our training and validation DataLoaders.
We can then pass this into Learner()
using some of the functionality we defined earlier, and then call the fit()
method to begin training for a specified number of epochs.
learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy)
learn.fit(10, lr=lr)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.636727 | 0.503329 | 0.495584 | 00:00 |
1 | 0.471677 | 0.220506 | 0.803729 | 00:00 |
2 | 0.175191 | 0.163141 | 0.854269 | 00:00 |
3 | 0.077810 | 0.100827 | 0.915604 | 00:00 |
4 | 0.041945 | 0.074958 | 0.935231 | 00:00 |
5 | 0.027924 | 0.060686 | 0.947988 | 00:00 |
6 | 0.022141 | 0.051695 | 0.955348 | 00:00 |
7 | 0.019544 | 0.045649 | 0.963690 | 00:00 |
8 | 0.018206 | 0.041354 | 0.965653 | 00:00 |
9 | 0.017388 | 0.038157 | 0.967125 | 00:00 |
There’s just a couple more things we need to do to make this a ‘proper’ neural network. Add a non-linear activation function, and another layer. This adds sufficient (non-linear) complexity that it helps the model learn better weights, and hence, better accuracy. Instead of using nn.Linear
we use nn.Sequential
so we can easilty ‘compose’ neural networks comprised of multiple layers.
Then we create our Learner
and call the fit
function as before, except this time we are using an additional layer.
learn = Learner(dls, simple_net, opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy)
learn.fit(40, 0.1)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.317320 | 0.410194 | 0.504416 | 00:00 |
1 | 0.148899 | 0.226347 | 0.809127 | 00:00 |
2 | 0.082341 | 0.113844 | 0.916585 | 00:00 |
3 | 0.054091 | 0.077327 | 0.942591 | 00:00 |
4 | 0.040992 | 0.060660 | 0.955839 | 00:00 |
5 | 0.034286 | 0.051260 | 0.964181 | 00:00 |
6 | 0.030415 | 0.045308 | 0.965653 | 00:00 |
7 | 0.027889 | 0.041224 | 0.966634 | 00:00 |
8 | 0.026063 | 0.038250 | 0.968106 | 00:00 |
9 | 0.024646 | 0.035980 | 0.969087 | 00:00 |
10 | 0.023498 | 0.034179 | 0.971541 | 00:00 |
11 | 0.022539 | 0.032703 | 0.973013 | 00:00 |
12 | 0.021721 | 0.031463 | 0.973994 | 00:00 |
13 | 0.021015 | 0.030397 | 0.974485 | 00:00 |
14 | 0.020396 | 0.029464 | 0.974975 | 00:00 |
15 | 0.019849 | 0.028638 | 0.974975 | 00:00 |
16 | 0.019360 | 0.027900 | 0.976448 | 00:00 |
17 | 0.018920 | 0.027235 | 0.976448 | 00:00 |
18 | 0.018522 | 0.026633 | 0.976938 | 00:00 |
19 | 0.018158 | 0.026085 | 0.977429 | 00:00 |
20 | 0.017824 | 0.025587 | 0.977429 | 00:00 |
21 | 0.017515 | 0.025131 | 0.977429 | 00:00 |
22 | 0.017229 | 0.024712 | 0.978410 | 00:00 |
23 | 0.016962 | 0.024328 | 0.978901 | 00:00 |
24 | 0.016712 | 0.023974 | 0.979392 | 00:00 |
25 | 0.016477 | 0.023646 | 0.979392 | 00:00 |
26 | 0.016256 | 0.023343 | 0.979392 | 00:00 |
27 | 0.016046 | 0.023061 | 0.979882 | 00:00 |
28 | 0.015848 | 0.022799 | 0.980864 | 00:00 |
29 | 0.015659 | 0.022555 | 0.980864 | 00:00 |
30 | 0.015480 | 0.022328 | 0.980864 | 00:00 |
31 | 0.015309 | 0.022115 | 0.981845 | 00:00 |
32 | 0.015146 | 0.021915 | 0.982336 | 00:00 |
33 | 0.014990 | 0.021727 | 0.981845 | 00:00 |
34 | 0.014840 | 0.021551 | 0.981845 | 00:00 |
35 | 0.014697 | 0.021384 | 0.982336 | 00:00 |
36 | 0.014560 | 0.021228 | 0.982336 | 00:00 |
37 | 0.014427 | 0.021079 | 0.981845 | 00:00 |
38 | 0.014300 | 0.020939 | 0.981845 | 00:00 |
39 | 0.014178 | 0.020806 | 0.981845 | 00:00 |
This pushes us up to 98% accuracy with a two-layer neural network! What would happen if we used, say, an 18-layer network?
dls = ImageDataLoaders.from_folder(path)
learn = vision_learner(dls, resnet18, pretrained=False,
loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.093842 | 0.011492 | 0.996075 | 00:10 |
Now we have achieved almost 100% accuracy, which demonstrates the power of deep learning neural networks! Even though this is a fairly simple model (by today’s standards) it’s still a useful exercise to get an early feel for training models from scratch on datasets and generating high quality results.
If you liked this post please consider following me on Twitter and LinkedIn for more AI content.