The same location can look different in summer and winter, and we should include both in our training set if we want to perform well in both situations.
We knew about this problem long ago and asked our labellers to add situation-specific labels to our dataset. Each image has associated situation labels. For example, we know for each image if it was day, rainy, if there were roadworks, close traffic participants (or far away), and many more things. With the labels and data we can evaluate our machine learning solutions for different situations, and we know for what situations we need to gather more data. This is useful for us if we want to know how well our machine learning algorithm performs in different situations, and which situations we still need to tackle.
The trained classifier can be used for data selection and for evaluation of our other classifiers. We can select the data we want to label in a smart way, and try to prevent the sparse data problem. The sparse data problem is something Felix Friedman wrote an article about on the autonomous driving website (http://autonomous-driving.org/2018/05/25/dataset-management-for-machine-learning/). We can evaluate the performance of our algorithms for several labels. If we notice that our car performs well when participants are close, but can't reliably find cars when they are far away, we can either improve our algorithm to fit these situations, or gather more relevant data for these situations.
If this sounds interesting and you are ready for a new challenge, come and join us!