Evaluating Ontology Development from the Extraction of Noun Phrases

  • Alexandra Moreira Universidade Federal de Viçosa
  • Alcione Oliveira Universidade Federal de Viçosa
  • Jugurta Lisboa-Filho Universidade Federal de Viçosa

Abstract


There are several methods for constructing an ontology. Among the automatic methods, one approach is the extraction of terms from domain documents and their subsequent extraction. In this case, the first step of the process is the extraction of noun phrases that are potential candidates to be components of the terminology of the area of interest. This article describes an automatic tool for the Brazilian Portuguese language that extracts noun phrases that can be adopted as terms for a certain domain. In addition, the system couples the extracted terms into a top-level ontology, which results in an initial ontology that can be further refined. To couple the ontology an anchor term was used, and a statistic analysis showed that the use of the term anchor leads to an improvement in the performance of the system. The tool described in this article was used to select terms to be used in an ontology for the power sector domain. Also, the precision in the creation of the ontology was evaluated. The technique was able to generate the correct hierarchy for 70\% of the terms.

Keywords: Ontology Development, Noun Phrase Extraction, Brazilian Portuguese

References

Aluísio, S., Pelizzoni, J., Marchi, A. R., de Oliveira, L., Manenti, R., and Marquiafável, V. (2003). Computational Processing of the Portuguese Language, chapter An account of the challenge of tagging a reference corpus for brazilian portuguese, pages 110–117. Springer.

da Cruz Carvalheira, L. C. (2007). Método semi-automático de construção de ontologias parciais de domínio com base em textos. PhD thesis, Universidade de São Paulo, São Paulo.

Dias-da Silva, B. C. (2006). Wordnet. br: An exercise of human language technology research. In Proceedings of the Third International WordNet Conference-GWC, pages 22–26.

Duran, M. S. and Aluísio, S. M. (2011). Propbank-Br: a brazilian portuguese corpus annotated with semantic role labels. In 8th Brazilian symposium in information and human language technology, pages 164–168, Cuiaba, Brazil. Sociedade Brasileira de Computação.

Kozareva, Z. (2014). Text Mining, chapter Simple, fast and accurate taxonomy learning, pages 41–62. Springer.

Lieber, R. and Stekauer, P. (2009). The Oxford handbook of compounding. Oxford University Press, Oxford.

Macken, L., Lefever, E., and Hoste, V. (2013). TExSIS: bilingual terminology extraction from parallel corpora using Chunk-based Alignment. Terminology, 19(1):1–30.

Maia, L. C. G. and Souza, R. R. (2010). Uso de sintagmas nominais na classicação automática de documentos eletrônicos. Perspectivas em Ciência da Informação, 15(1):154–172.

Marchand, H. (1969). The categories and types of present-day English word-formation: a synchronic-diachronic approach. Verlag C. H. Beck, München.

Maynard, D., Li, Y., and Peters, W. (2008). Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, volume 167, chapter Nlp techniques for term extraction and ontology population, pages 107–127. Ios Press.

Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41.

Moreira, A., Alvarenga, L., and Oliveira, A. P. (2004). O nível do conhecimento e os tesauros e ontologias. DataGramaZero-Revista de instrumentos de representação: Ciência da Informação, 5(6):1–25.

Moreira, A., Lisboa Filho, J., and Oliveira, A. (2016). Automatic creation of ontology using a lexical database: an application for the energy sector. In International Conference on Applications of Natural Language to Information Systems, pages 415–420. Springer.

Pease, A., Niles, I., and Li, J. (2002). The suggested upper merged ontology: a large In Working notes of the AAAIontology for the semantic web and its applications. 2002 workshop on ontologies and the semantic web, volume 28, Edmonton, Alberta, Canada. AAAI Press.

Rani, M., Dhar, A. K., and Vyas, O. (2017). Semi-automatic terminology ontology learning based on topic modeling. Engineering Applications of Articial Intelligence, 63:108–125.

Sanchez, D. and Moreno, A. (2004). Creating ontologies from Web documents. Recent Advances in Articial Intelligence Research and Development, 113:11–18.

Teline, M. F., Almeida, G., and Aluisio, S. M. (2003). Extração manual e automática In 16th Brazilian Symposium de terminologia: comparando abordagens e critérios. on Computer Graphics and Image Processing-SIBGRAPI, São Carlos, Brazil. IEEE Computer Society.

Williams, E. (1981). On the notions “lexically related” and “head of a word”. Linguistic inquiry, 12(2):245–274.

Zavaglia, C., Aluísio, S., Nunes, M. G. V., and Oliveira, L. (2007). Estrutura ontológica e unidades lexicais: uma aplicação computacional no domínio da ecologia. In Proceedings of the 5th Workshop in Information and Human Language Technology, pages 1575–84, Rio de Janeiro, Brazil.
Published
2019-10-15
MOREIRA, Alexandra; OLIVEIRA, Alcione; LISBOA-FILHO, Jugurta. Evaluating Ontology Development from the Extraction of Noun Phrases. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 84-95. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9274.