Hierarchical Cluster Labeling of Software Requirements Using Contextual Word Embeddings

  • Adailton Araujo USP
  • Ricardo Marcacini USP


The popularization of social media has motivated research into machine learning methods for software requirements extraction from user comments and reviews. These methods analyze software review datasets to classify some textual excerpts as software requirements, which allows the management and monitoring of the evolution of software quality directly from the crowd users’ perspective. However, the existing methods have two major limitations. First, several duplicate requirements are extracted from reviews because users write the same requirement in different ways by using synonyms and non-technical language, often with misspellings and ambiguity. Second, requirements extraction methods do not deal with different granularity levels, thereby ignoring hierarchical relationships between software requirements. This paper presents a hierarchical cluster labeling approach for software requirements based on contextual word embeddings to address these challenges. We explore neural language models to obtain a more semantic and robust representation of the software requirements, in which the texts are represented by contextual word embeddings. Our approach organizes the software requirements into clusters and sub-clusters according to requirement similarities in the embedding space. Finally, we select representative software requirements to label each cluster and sub-cluster, thereby dealing with both duplicate and different granularity levels of the software requirements. An experimental evaluation using review datasets from 8 mobile apps shows that our approach obtains promising results and presents new ideas and research directions for data-driven requirements engineering.
ARAUJO, Adailton; MARCACINI, Ricardo. Hierarchical Cluster Labeling of Software Requirements Using Contextual Word Embeddings. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 35. , 2021, Joinville. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 .