What is transfer learning? — Computer Vision Division

When trying to find options regarding what network architectures and learning methods we could use, we ran into NVIDIA’s Transfer Learning Toolkit. Transfer learning is the process in which features are extracted from a network which was previously trained to perform some task, in order to feed them into a new network capable of solving a similar problem with new data. This technique is currently very popular in the deep learning realm because it enables deep networks to be trained with a small data set, which comes in handy in real life problems where there aren’t millions of tagged images to use in order to train a model.

But how is this done? Well, first we need to talk about Convolutional Neural Networks and we will do so by looking at an oversimplified diagram of one of this networks.

Image taken from: https://docs.ecognition.com/v9.5.0/eCognition_documentation/Reference%20Book/23%20Convolutional%20Neural%20Network%20Algorithms/Convolutional%20Neural%20Network%20Algorithms.html

In convolutional neural networks (known as CNNs) three separate types of layers can be seen: convolution layers, pooling layers and the classifier layers. Convolution and pooling layers make up the first and middle stages of the CNN as can be seen above, and by virtue of a few mathematical operations are able to recognize edges, mostly associated to convolutional layers, and shapes, mostly associated to pooling layers, in input images. Finally, this data is fed to one or more classifier layers which classify the objects in the images according to pre-fed classes.

In transfer learning the pooling and convolutional layers are mearly used, and only the final classifier layers are retrained in order for the network to recognize the new classes that might have been added in our dataset for a given problem.

Why use transfer learning? Well, for one there’s no way we can generate and tag the millions of images needed to train a neural net from zero. Tagging itself is extremely time consuming and even if we could store the millions of images needed, tagging itself would take a lot of time. Transfer learning significantly reduces the size of the training dataset needed given the network was already previously trained with an extensive dataset. Furthermore, TLT already has a few models trained to recognize pedestrians and other similar classes to the ones we are trying to detect, so retraining should be a fairly straightforward process.