Recently, deep learning has created breakthroughs in several fields of computer science, including image recognition. These advancements are related to performance gains and can be illustrated by the ImageNet database and challenge.

ImageNet

The ImageNet dataset contains 14.2 million images separated into 1.000 categories (e.g. types of plants, animals, natural objects, and people).  It is the basis for the ImageNet Challenge, where researchers train their models and submit their predictions for which category each image should be classified into. Below there is a graph showing the error of the champion algorithm each year, from 2010 to 2016:

Photo: Own elaboration (2019).

The first thing to notice is that errors were steadily decreasing each year – in 2015 the models surpassed human performance levels. The biggest leap in the performance was from 2011 to 2012, and that year represented a big change in terms of the model used: instead of using classic machine learning algorithms, such as Support Vector Machines (SVMs, represented in blue), the winners started to use deep learning, more specifically Convolutional Neural Networks (CNNs, represented in orange).

The greatest caveat with the CNN models is that they need a lot of data to have good performance, with datasets regularly containing over 1 million images. This behavior is not exclusive to CNNs and image processing, but occurs in other fields such as Natural Language Processing (NLP) as well. In fact, this is the main “superpower” of deep learning compared to classical machine learning models: the continuous improvement of performance with more data, as represented by the picture below (as you can see, traditional models tend to stabilize their performance after some threshold):

Photo: Sumologic (2019).

Transfer learning

So, if those deep learning models need a lot of data, what should a company that doesn’t have a lot of data, but wants to use those models, do? The answer is transfer learning. Transfer learning is a method whose objective is to transfer knowledge learned on a problem to similar problems. The most common type of transfer learning is called fine tuning, where you take a model pre-trained on a larger database (like the ImageNet one) and adapt it to your smaller dataset.

Before going deeper into fine tuning, it is important to have some understanding of how deep learning algorithms work. A deep learning algorithm is a very large neural network (hence the adjective “deep”), which contains several layers (generally the models contain from 50 to 200 layers, but every day there are bigger models appearing). Each layer is represented by a series of numbers, called weights, and to train a model is to find the best weights that solve the problem. The problem is that a full model has a lot of weights (around 10 to 200 million) and it is difficult to obtain all those numbers with just a small amount of data.

It depends on your amount of data

Going back to fine tuning, the specific way to adapt this pre-trained model to your problem depends on the amount of data you have. If you have a small number of pictures, you should take the pre-trained model as a whole and just retrain the last layer to solve your specific problem (e.g. instead of classifying images into 1.000 categories, your problem could be to only detect if the image is a human or not).

If you have a moderate amount of data, you should keep the first layers as they are (not changing its weights) and retrain the middle to end layers (the number of layers to train grows as your quantity of data grows). Finally, even if you have a lot of data, transfer learning is useful to reduce your training time and your computing cost. In this situation, you should use the pre-trained weights as a starting point instead of using random weights and retraining the whole model.

In my next post, I will show you where to find those pre-trained models and how to import and use them, using open source models and data, to solve a new problem.

About the author

Luiz Nonenmacher is a Data Scientist at Poatek. He has a Master’s Degree in Production Engineering in the area of Machine Learning and Quantitative Methods. On his spare time, he likes to read about a lot of different subjects, including (but not limited to) classical philosophy (especially Stoicism), science, history, fantasy, and science fiction.