Doclass: open-source software to support document labeling and classification

  • Marcelo Inuzuka Universidade Federal de Goiás
  • Hugo do Nascimento Universidade Federal de Goiás
  • Fernando Almeida Universidade Federal de Goiás
  • Bruno Barros Universidade Federal de Goiás
  • Walid Jradi Ultimatum Tecnologia Jurídica


This article introduces Doclass, a free and open-source software for the Web that aims to assist in labeling and classifying large sets of documents. The research involved a design science research methodology, guided by the real demands of a legal text processing company. The architecture, several design decisions and the current development stage of the software are presented. Preliminary user experiments for evaluating interactive document labeling are described. As a result, the first version of a system with an architecture composed of a mobile frontend that communicates with a backend through a REST API was published, with satisfactory performance evaluation by the applicant. Other results involve the use of active learning techniques to reduce human effort when performing the classification of documents, as well as the Uncertainty strategy to choose the document to be labeled. The effectiveness of the stop criterion for the active learning technique based on confidence level was tested and proved unsatisfactory, remaining as a future work.

Palavras-chave: document classification, active learning, annotation tool, document labeling, legal text


INUZUKA, Marcelo; DO NASCIMENTO, Hugo; ALMEIDA, Fernando; BARROS, Bruno; JRADI, Walid. Doclass: open-source software to support document labeling and classification. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 8. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 105-112. ISSN 2763-8944. DOI: