Active Learning and Labeling by Roland Meertens

At Autonomous Intelligent Driving GmbH our mission is to deploy self-driving cars soon and safe. What you need to do this is data, lots of data. Just like every company that tries to build self-driving vehicles, we gather this data by driving around and sending it to a labelling company to build a dataset.

If you are often working with datasets for self-driving vehicles you probably encountered datasets such as Udacity's dataset ( or the KITTI dataset (

Unfortunately, these datasets only include data taken on a few specific days. If you perform well on these specific datasets, you don't know if you will also perform well in different situations. As we would like to drive in as many different situations as possible, we want to have labeled data from as many different situations as possible.
The same location can look different in summer and winter, and we should include both in our training set if we want to perform well in both situations.

We knew about this problem long ago and asked our labellers to add situation-specific labels to our dataset. Each image has associated situation labels. For example, we know for each image if it was day, rainy, if there were roadworks, close traffic participants (or far away), and many more things. With the labels and data we can evaluate our machine learning solutions for different situations, and we know for what situations we need to gather more data. This is useful for us if we want to know how well our machine learning algorithm performs in different situations, and which situations we still need to tackle.

Using the labels is also useful to select the data we still have to label. We can select data using a machine learning algorithm that predicts if the image is taken in a situation where we still need data for. To actually do this we trained a neural network that recognizes specific situations. We used so-called "transfer learning" and train on all labels at the same time. Transfer learning is the practice of taking an existing neural network trained on a specific task and retraining this neural network on another task. In our case, we took a neural network trained on the ImageNet classes. Given an image, the network could say if it was a picture of a cat, a specific type of dog, a pirate ship, and more (

A big reason for transfer learning is data our data is sparse (for some labels we have less than 100 positive examples and thousands of negative examples), so training a neural network for each class is not viable. By using transfer learning we profit from existing lower level filters. By training on all classes at the same time the gradients from other classes influence the upper layers!

We used a pre-trained resnet50 architecture ( as the basis for our neural network. We appended another layer, which gives an output per class (per pixel) and added global average pooling ( to end up with labels per image.

As described above, we train all labels simultaneously and use binary cross-entropy per label as our loss function. The advantage of this approach is that the whole network will profit from the gradients of all of the labels. We first train the final layer for one epoch and then optimize more lower layers per epoch. This way the neural network does not immediately destroy the initially learned filters, but is gently nudged into using the existing knowledge it has.

To evaluate the output, we plotted the area under the curve for labels we have on a test-set. Although the neural network does not perform perfectly for each class, the predicted tags are good enough to be used for pre-labelling.

The trained classifier can be used for data selection and for evaluation of our other classifiers. We can select the data we want to label in a smart way, and try to prevent the sparse data problem. The sparse data problem is something Felix Friedman wrote an article about on the autonomous driving website ( We can evaluate the performance of our algorithms for several labels. If we notice that our car performs well when participants are close, but can't reliably find cars when they are far away, we can either improve our algorithm to fit these situations, or gather more relevant data for these situations.

If this sounds interesting and you are ready for a new challenge, come and join us!

About Roland Meertens

Roland Meertens studied artificial intelligence at the Radboud University. At AID he works on smart computer-vision algorithms for self-driving vehicles. Before joining AID he worked on are neural machine translation, obstacle avoidance on small drones, and a social robot for elderly people. He sometimes publishes posts on his blog