Comparison of pretrained models for image classification

Introduction

There are several convolutional neural network (CNN) architectures developed for image classification tasks. The first CNN architecture, LENET was developed in 1999, and was trained on MNIST dataset for handwritten digit recognition. Overtime, other pretrained models such as ALEXNET in 2012, VGG in 2014, and RESNETS in 2015 were developed, and trained on ImageNet datasets. These pretrained models utilize different filter sizes and convolution layers; activation functions; and fully connected layers (dense layers). Thus, the aim of this project is to compare the accuracy of two commonly used architectures - VGG16 and RESNet50 in classifying images by utilizing their weights trained on ImageNet dataset.

Getting started by importing VGG16 model

Using Tensorflow2 with Keras, we will import VGG16 model with the code below:

from tensorflow.keras.applications.vgg16 import VGG16 #imports VGG16
from tensorflow.keras.preprocessing import image #image preprocessing
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions #image label prediction
import numpy as np 
model = VGG16(weights='imagenet') #loads the weights
model.summary() #summarizes the attributes of the model

From the image above, VGG16 has 13 convolution layers, 3 dense layers, and 138,357,544 parameters.

Download and load test images

We will download a zipped file which contains the test images (inputs). The images are extracted and preprocessed before feeding them into the model.

!wget https://moderncomputervision.s3.eu-west-2.amazonaws.com/imagesDLCV.zip #downloads the file
!unzip imagesDLCV.zip #unzips the file

To extract the images from the file:

import cv2
from os import listdir
from os.path import isfile, join

#Get images located in ./images folder    
mypath = "./images/class1/" #path of the folder containing the images
file_names = [f for f in listdir(mypath) if isfile(join(mypath, f))]
print (file_names)

Feeding the inputs into the model

import matplotlib.pyplot as plt

fig=plt.figure(figsize=(16,16))
all_top_classes = [] #saves the predicted label of each image

# Loop through images 
for (i,file) in enumerate(file_names):
    # image preprocessing
    img = image.load_img(mypath+file, target_size=(224, 224)) #minimum input size
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)

    #load image using opencv
    img2 = cv2.imread(mypath+file)
    #imageL = cv2.resize(img2, None, fx=.5, fy=.5, interpolation = cv2.INTER_CUBIC) 

    # Get Predictions
    preds = model.predict(x)
    preditions = decode_predictions(preds, top=5)[0]
    all_top_classes.append([x[1] for x in preditions])
    # Plot image
    sub = fig.add_subplot(len(file_names),1, i+1)
    sub.set_title(f'Predicted {str(preditions)}')
    plt.axis('off')
    plt.imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))

plt.show() #outputs the images with their predicted labels

We will view the top 5 labels of each image predicted by printing all_top_classes

Creating ground truth label

We will create a list which contains the true label for each image. The list would contain the labels in the order they were predicted by the model.

ground_truth = ['collie','basketball','beer_glass','doormat','limousine','spider_web','burrito','Christmas_stocking','German_shepherd']

Create a function that outputs the accuracy of the model

Since we have a list of the top 5 predicted labels for each image, we will compare these labels to the ground truth label. In other words, we will check if the ground truth label for each image falls within the top-5 predicted labels.

def getScore(all_top_classes, ground_truth, N):
  # Calcuate rank-N score
  in_labels = 0
  for (i,labels) in enumerate(all_top_classes):
    if ground_truth[i] in labels[:N]:
      in_labels += 1
  return f'Rank-{N} Accuracy = {in_labels/len(all_top_classes)*100:.2f}%'

print(getScore(all_top_classes, ground_truth, 5))

Importing ResNet50 for image classification

We will load the model in a similar way.

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

model = ResNet50(weights='imagenet')
model.summary()

Feeding the inputs into the model

import matplotlib.pyplot as plt

fig=plt.figure(figsize=(16,16))
all_top_classes = []

# Loop through images run them through our classifer
for (i,file) in enumerate(file_names):

    img = image.load_img(mypath+file, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)

    #load image using opencv
    img2 = cv2.imread(mypath+file)
    #imageL = cv2.resize(img2, None, fx=.5, fy=.5, interpolation = cv2.INTER_CUBIC) 

    # Get Predictions
    preds = model.predict(x)
    preditions = decode_predictions(preds, top=5)[0]
    all_top_classes.append([x[1] for x in preditions])
    # Plot image
    sub = fig.add_subplot(len(file_names),1, i+1)
    sub.set_title(f'Predicted {str(preditions)}')
    plt.axis('off')
    plt.imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))

plt.show()

Getting the top-5 predicted labels for each image:

Calculating the accuracy of the model:

def getScore(all_top_classes, ground_truth, N):
  # Calcuate rank-N score
  in_labels = 0
  for (i,labels) in enumerate(all_top_classes):
    if ground_truth[i] in labels[:N]:
      in_labels += 1
  return f'Rank-{N} Accuracy = {in_labels/len(all_top_classes)*100:.2f}%'

print(getScore(all_top_classes, ground_truth, 5))

Conclusion

The two models (VGG16 and ResNet50) have different accuracy results. The rank-5 accuracy of VGG16 for classifying images was 88.89%, while the rank-5 accuracy of ResNet50 was 100%. Thus, ResNet50 was better suited for the image classification task. There were other differences between the two models such as the size of the parameters with VGG16 and ResNet50 having 138,357,544 and 25,636,712 parameters respectively. Over the years, developers utilize different hyperparameters (activation function, learning rate, loss function) in building a model for image classification. Hence, it is important to examine different models during image classification to obtain the best result.

Comparison of the accuracy of VGG16 and ResNet50 for image classification

Table of contents

No headings in the article.