We built a tool called Transimprove which takes labelled data with multiple labels per data point and an existing model as input and outputs corrected labels. We conducted multiple experiments using DeepDIVA as a framework with different self-generated inconsistencies.
Deep Learning, Labelling, Data Quality, Noisy Labels, Inconsistent Labels, Consistency
In this project we investigate two effects of inconsistent labels on the accuracy of the trained models: Firstly, we examine the impact of ignoring inconsistent labels on the accuracy of the trained models. Secondly, we explore if an external existing model can be used to correct inconsistent labels and if so, if it improves the accuracy of the trained models.
Today there are many high-quality datasets for different classification tasks. In practice, pre-labelled datasets are rare and mostly unavailable for new classification tasks. Nevertheless, a high-quality dataset is needed to achieve accurate models. Most deep learning algorithms only handle a certain amount of false labels within the dataset.
Our results show two core findings: Removing inconsistent data points from the training set decreases the accuracy of a trained model; correcting inconsistent data point with an existing model significantly improves model accuracy compared to simple majority voting. As a consequence, it may thus be beneficial to correct inconsistent labels with the help of specialised existing models instead of relabelling them with a crowdsourcing system or costly experts.
Project duration: 1 Semester
Man hours invested: 360h (180h per person)
Team size: 2 Students
Institute for Interactive Technologies | FHNW
Philipp Lüthi (philipp.luethi@students.fhnw.ch)
Thibault Gagnaux (thibault.gagnaux@students.fhnw.ch)
Prof. Dr. Samuel Fricker (samuel.fricker@fhnw.ch)
Mr. Marcel Würsch (marcel.wuersch@fhnw.ch)