Investigating Opinion Mining through Language Varieties: a Case Study of Brazilian and European Portuguese tweets
Portuguese is a pluricentric language comprising variants that differ from each other in different linguistic levels. It is generally agreed that applying text mining resources developed for one specific variant may produce a different result in another variant, but very little research has been done to measure this difference. This study presents an analysis of opinion mining application when dealing with the two main Portuguese language variants: Brazilian and European. According to the experiments, it was observed that the differences between the Portuguese variants reflect on the application results. The use of a variant for training and another for testing brings a substantial performance drop, but the separation of the variants may not be recommended.