Multi-Entity Polarity Analysis in Financial Documents
Resumo
The amount of information available in the Internet does not allow performing manual content analysis to identify information of interest. Thus automated analyses are used to identify information of interest, and one increasingly important approach is the polarity analysis. Polarity analysis is the classification of a text document in positive, negative, and neutral, according to a certain topic. This classification of information is particularly useful in the finance domain, where news about a company can affect the performance of its stocks. Although most of the methods in financial domain consider that the whole document is associated with a particular entity, this is not always the case. In fact, it is common that authors cite several entities in a single document and these entities are cited with different polarity. Accordingly, the objective of this paper was to study strategies for polarity detection in financial documents with multiple entities. Specifically, we studied methods based on learning of multiple models, one for each observed entity, using SVM classifiers. We evaluated models based on the partition of documents into fragments according to the entities they cite. We used several heuristics to segment documents based on shallow and deep natural language processing (NLP). We found that entity-specific models created by partitioning the document collection into segments outperformed the strategy based on the use of entire documents. We also observed that more complex segmentation using anaphora resolution was not able to outperform a low-cost approach, based on simple string matching.
Publicado
18/11/2014
Como Citar
FERREIRA, Javier Zambrano; RODRIGUES, Josiane; CRISTO, Marco; OLIVEIRA, David Fernandes de.
Multi-Entity Polarity Analysis in Financial Documents. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 20. , 2014, João Pessoa.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2014
.
p. 115-122.