A Process for Building Domain Specific Thesauri for Query Expansion to Mine SW Documents Repositories within an Industrial Environment

  • Mailton Carvalho UFPE
  • Flavia Barros UFPE
  • Ricardo Prudêncio UFPE


In large Software Companies, retrieving relevant artifacts from indexed repositories with thousands of textual documents depends strongly on the quality of the query. Although existing search engines work well to some degree, they face problems with words matching. Often, only synonyms of words in the queries are present in the indexed documents. However, the coverage of Information Retrieval systems can be improved by Queries Expansion operation, which adds new (correlated) terms to the original query. Usually these words are obtained from a dictionary of synonyms (Thesaurus). Our work proposes a process for the automatic construction of domain thesauri based on documents available from SW Companies’ local repositories. The aim is to avoid ambiguous or non-correlated words found in generic thesauri. The implemented system was used to generate a domain thesaurus based on private documents from a real world Company. The created thesaurus was used for query expansion to improve the performance of two local documents retrieval systems, showing very satisfactory results. This work was conducted within the context of a research cooperation project between Motorola Mobility (a Lenovo Company) and Centro de Informática (CIn-UFPE).

CARVALHO, Mailton; BARROS, Flavia; PRUDÊNCIO, Ricardo. A Process for Building Domain Specific Thesauri for Query Expansion to Mine SW Documents Repositories within an Industrial Environment. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 35. , 2021, Joinville. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . ISSN 2833-0633.