nw  

Automated Data Quality Assessment for Deep Learning

Summary

We built a tool called Transimprove which takes labelled data with multiple labels per data point and an existing model as input and outputs corrected labels. We conducted multiple experiments using DeepDIVA as a framework with different self-generated inconsistencies.

Keywords

Deep Learning, Labelling, Data Quality, Noisy Labels, Inconsistent Labels, Consistency

Objective

In this project we investigate two effects of inconsistent labels on the accuracy of the trained models: Firstly, we examine the impact of ignoring inconsistent labels on the accuracy of the trained models. Secondly, we explore if an external existing model can be used to correct inconsistent labels and if so, if it improves the accuracy of the trained models.

Initial Position

Today there are many high-quality datasets for different classification tasks. In practice, pre-labelled datasets are rare and mostly unavailable for new classification tasks. Nevertheless, a high-quality dataset is needed to achieve accurate models. Most deep learning algorithms only handle a certain amount of false labels within the dataset.

Results

Our results show two core findings: Removing inconsistent data points from the training set decreases the accuracy of a trained model; correcting inconsistent data point with an existing model significantly improves model accuracy compared to simple majority voting. As a consequence, it may thus be beneficial to correct inconsistent labels with the help of specialised existing models instead of relabelling them with a crowdsourcing system or costly experts.

Project facts

Project duration: 1 Semester
Man hours invested: 360h (180h per person)
Team size: 2 Students

Customer

Institute for Interactive Technologies | FHNW

Project team

Philipp Lüthi (philipp.luethi@students.fhnw.ch)
Thibault Gagnaux (thibault.gagnaux@students.fhnw.ch)

Contact

Prof. Dr. Samuel Fricker (samuel.fricker@fhnw.ch)
Mr. Marcel Würsch (marcel.wuersch@fhnw.ch)

<< zurück