Using Statistical Features to Find Phrasal Terms in Text Collections

Authors

  • Andre Luiz da Costa Carvalho Universidade Federal do Amazonas
  • Edleno Silva de Moura No affiliation declared
  • Pável Calado No affiliation declared

DOI:

https://doi.org/10.5753/jidm.2010.1296

Keywords:

phrasal terms, phrase queries

Abstract

In this work we investigate alternatives to automatically detect phrasal terms, defined here as phrasal verbs, phrasal nouns, phrasal adjectives or phrasal adverbs found in a text. The automatic identification of phrasal terms may have several applications in text processing systems. We approach this problem and present a novel approach for detecting phrasal terms in a collection of documents. Our solution is based on machine learning and uses statistical features of the word n-grams found in the documents. We also investigate the particular impact of adding phrasal terms in the retrieval model of a search engine when processing queries on several data sets. Our results show that we are able to discover valid phrasal terms with a small error rate, achieving detection results ranging from 70% to 94% in terms of F1. Furthermore, the discovered phrasal terms, when used to enhance search tasks, allow improvements in retrieval performance of up to 11% in terms of MAP when considering all queries, and up to 36% in terms of MAP when considering only the queries that contained the detected phrasal terms.

Downloads

Download data is not yet available.

Downloads

Published

2010-10-06

How to Cite

Carvalho, A. L. da C., Moura, E. S. de, & Calado, P. (2010). Using Statistical Features to Find Phrasal Terms in Text Collections. Journal of Information and Data Management, 1(3), 583. https://doi.org/10.5753/jidm.2010.1296

Issue

Section

Regular Papers