An Analysis of Subjectivity in Brazilian News

D. F. Lima; A. S. C. Melo; L. B. Marinho

doi:10.5753/kdmile.2019.8792

D. F. Lima Universidade Federal de Campina Grande
A. S. C. Melo Universidade Federal de Campina Grande
L. B. Marinho Universidade Federal de Campina Grande

DOI: https://doi.org/10.5753/kdmile.2019.8792

Resumo

With the advent of digital journalism, the democratization of information has become a reality, since news articles are published as soon as the facts occur and are accessible from any device connected to the internet. It is common sense the perception that some newspapers are more biased than others when it comes to the way of exposing the facts. However, automatic ways of measuring such biases is still an open research challenge. Under the premise that journalistic texts must have objective and unbiased language, news with high levels of subjectivity may indicate bias. In this paper, we propose to use subjectivity lexicons to characterize subjectivity in five news portals that are popular in Brazil. To better understand the results found, we performed a correlation analysis between the levels of subjectivity found and readability and news popularity metrics. We believe that the methods we used along with our findings contribute to a better understanding of the linguistic characteristics of the news we consume daily.

Palavras-chave: Bias, Machine Learning, Natural Language Processing, News, Subjectivity

Referências

Amorim, E., Cançado, M., and Veloso, A. Automated essay scoring in the presence of biased ratings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 229–237, 2018.

Anderson, J. Lix and rix: Variations on a little-known readability index. Journal of Reading 26 (6): 490–496, 1983.

Bae, Y. and Lee, H. Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers. Journal of the American Society for Information Science and Technology 63 (12): 2521–2535, 2012.

Benveniste, E. Subjectivity in language. Problems in general linguistics vol. 1, pp. 223–230, 1971.

Chaturvedi, I., Cambria, E., Zhu, F., Qiu, L., and Ng, W. K. Multilingual subjectivity detection using deep multiple kernel learning. Proceedings of Knowledge Discovery and Data Mining, Sydney, 2015.

Coleman, M. and Liau, T. L. A computer readability formula designed for machine scoring. Journal of Applied Psychology 60 (2): 283, 1975.

Flaounas, I., Ali, O., Lansdall-Welfare, T., De Bie, T., Mosdell, N., Lewis, J., and Cristianini, N. Research methods in the age of digital journalism: Massive-scale automated analysis of news-content—topics, style and gender. Digital journalism 1 (1): 102–116, 2013.

Goldberg, B. Bias: A CBS Insider Exposes How the Media Distort the News. Regnery Publishing, 2001.

Jha, V., Shreedevi, G., Shenoy, P. D., and Venugopal, K. Generating multilingual subjectivity resources using english language. Int. J. Comput. Appl 152 (9): 41–47, 2016.

Klare, G. R. A table for rapid determination of dale-chall readability scores. Educational Research Bulletin, 1952.

Kusner, M., Sun, Y., Kolkin, N., and Weinberger, K. From word embeddings to document distances. In International Conference on Machine Learning. pp. 957–966, 2015.

Mihalcea, R., Banea, C., and Wiebe, J. Learning Multilingual Subjective Language via Cross-Lingual Projections. Proceedings of ACL 1 (1): 14–21, 2007.

Mikolov, T., Le, Q. V., and Sutskever, I. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 , 2013.

Moraes, S. M., Santos, A. L., Redecker, M., Machado, R. M., and Meneguzzi, F. R. Comparing approaches to subjectivity classification: A study on portuguese tweets, 2016.

Nigam, S., Kumar, N., Mandal, N., Padma, B., and Rao, S. Real time ambient air quality status during diwali festival in central, india. Journal of Geoscience and Environment Protection vol. 4, pp. 162–172, 2016.

Sales, A., Balby, L., and Veloso, A. Media bias characterization in brazilian presidential elections. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. HT ’19. ACM, New York, NY, USA, pp. 231–240, 2019.

Wiebe, J., Wilson, T., and Cardie, C. Annotating Expressions of Opinions and Emotions in Language. Empirical Methods in Natural Language Processing 1 (1): 164–210, 2005.

Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S. Opinionfinder: A system for subjectivity analysis. In Proceedings of HLT/EMNLP 2005 Interactive Demonstrations. pp. 34–35, 2005.

Yaqub, U., Sharma, N., Pabreja, R., Chun, S., Atluri, V., and Vaidya, J. Analysis and visualization of subjectivity and polarity of twitter location data. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age. ACM, pp. 67, 2018.

Zar, J. H. Significance testing of the spearman rank correlation coefficient. Journal of the American Statistical Association 67 (339): 578–580, 1972.