Using Statistical Features to Find Phrasal Terms in Text Collections

Andre Luiz da Costa Carvalho; Edleno Silva de Moura; Pável Calado

doi:10.5753/jidm.2010.1296

Authors

Andre Luiz da Costa Carvalho Universidade Federal do Amazonas
Edleno Silva de Moura No affiliation declared
Pável Calado No affiliation declared

DOI:

https://doi.org/10.5753/jidm.2010.1296

Keywords:

phrasal terms, phrase queries

Abstract

In this work we investigate alternatives to automatically detect phrasal terms, defined here as phrasal verbs, phrasal nouns, phrasal adjectives or phrasal adverbs found in a text. The automatic identification of phrasal terms may have several applications in text processing systems. We approach this problem and present a novel approach for detecting phrasal terms in a collection of documents. Our solution is based on machine learning and uses statistical features of the word n-grams found in the documents. We also investigate the particular impact of adding phrasal terms in the retrieval model of a search engine when processing queries on several data sets. Our results show that we are able to discover valid phrasal terms with a small error rate, achieving detection results ranging from 70% to 94% in terms of F1. Furthermore, the discovered phrasal terms, when used to enhance search tasks, allow improvements in retrieval performance of up to 11% in terms of MAP when considering all queries, and up to 36% in terms of MAP when considering only the queries that contained the detected phrasal terms.

Downloads

Download data is not yet available.

Using Statistical Features to Find Phrasal Terms in Text Collections

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Metrics: