LabelUX! Guidelines to support software engineers to design data labeling systems
Resumo
The demand for systems using artificial intelligence has substantially boosted in recent times, especially with Machine Learning (ML) techniques. Systems that use ML supervision techniques need representative and correctly categorized data to ensure its quality. In this context, a data labeling step plays a fundamental role during the development of such systems. The labeling is performed by users specialized in the data domain and aims to generate a database to enable a supervised ML model. However, labeling is exhausting for users, which can compromise the quality of the ML system, especially if the labeling is being done on systems that were not designed to assist the user in this activity. On the one hand, it can be difficult for a software engineer to design these kinds of systems. Depending on the type of data to be labeled, the interface needs different graphics and strategies to present and request user feedback. Aiming to help software engineers develop these kinds of systems, this work proposes the LabelUX guidelines. These guidelines aim to support software engineers in designing data labeling systems, defining a design with quality that provides a better user experience during the labeling task. We developed these guidelines from studies carried out in the literature and industry. We selected software engineers working on ML projects to participate in a feasibility study to evaluate the use of guidelines. The qualitative results obtained through the interview improved that the LabelUX guidelines supported a better design of textual type data labeling systems.