Personality classification on Twitter: An analysis on multicultural learning transfer feasibility

  • Arthur Pereira de Oliveira UFPA
  • Marcos César da Rocha Seruffo UFPA

Abstract


Extracting information about personality is a research object for studies with several applications, such as recommendation systems, recruitment processes, among others. This research field has well-established and efficient resources and methodologies, but with a high concentration of applications in data from the english language. This article contributes to the personality classification for languages that have a smaller number of resources, leveraging the use of techniques aimed at the english language, thus, allowing advances in personality classification technologies created for english-speaking audiences to reflect the advancement of other languages and cultures. For this, techniques of Data Mining, Natural Language Processing and Machine Learning with Word Embedding are used, exploring the correlation between personality traits and textual lexical properties, with data from Online Social Networks. The results obtained are satisfactory when compared with related research applied only in the English language, thus demonstrating the feasibility of wider use of techniques previously aimed at a single culture, indicating the possibility of overcoming the multicultural barrier present in the literature.
Keywords: Personality Classification, Machine Learning, Natural Language Processing, Word Embedding

References

Arnoux, P.-H., Xu, A., Boyette, N., Mahmud, J., Akkiraju, R., and Sinha, V. (2017). 25 tweets to know you: A new model to predict personality with social media. In Proceedings of the International AAAI Conference on Web and Social Media, volume 11.

Bayram, N. and Aydemir, M. (2017). Decision-making styles and personality traits.

Bird, S., Klein, E., and Loper, E. (2009).Natural Language Processing with Python.O’Reilly Media

Carducci, G., Rizzo, G., Monti, D., Palumbo, E., and Morisio, M. (2018). Twitpersonality: Computing personality traits from tweets using word embeddings and supervised learning.Information, 9(5):127

Celli, F., Pianesi, F., Stillwell, D., and Kosinski, M. (2013). Workshop on computational personality recognition: Shared task. InProceedings of the International AAAI Conference on Web and Social Media, volume 7.

Coltheart, M. (1981). The mrc psycholinguistic database.The Quarterly Journal of Experimental Psychology Section A, 33(4):497–505

Costa, P. and Mccrae, R. (1992). Neo pi-r professional manual.Psychological Assessment Resources, 396.

dos Santos, W. R., Ramos, R. M., and Paraboni, I. (2019). Computational personality recognition from facebook text: psycholinguistic features, words and facets.New Review of Hypermedia and Multimedia, 25(4):268–287

Enge, j. M., G ̊asodden, G., Morten Amundsen, O., Sundt, P. A., and Moxnes, M.(2020). Big five personality test.https://bigfive-test.com/. Acessado em 01/12/2020

Goldberg, L. R. (1990). An alternative ”description of personality”: the big-five factor structure.Journal of personality and social psychology, 59 6:1216–29

Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. InProceedings of the International Conference on Language Resources and Evaluation (LREC 2018)

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines.IEEE Intelligent Systems and their Applications, 13(4):18–28

acobelli, F., Gill, A. J., Nowson, S., and Oberlander, J. (2011). Large scale personality classification of bloggers. Ininternational conference on affective computing and intelligent interaction, pages 568–577. Springer

John, O. P., Srivastava, S., et al. (1999). The big five trait taxonomy: History, measurement, and theoretical perspectives.Handbook of personality: Theory and research,2(1999):102–138.

Johnson, J. A. (2014). Measuring thirty facets of the five factor model with a 120 itempublic domain inventory: Development of the ipip-neo-120.Journal of Research in Personality, 51:78–89.

Kenney, J. F. and Keeping, E. (1962). Linear regression and correlation.Mathematics ofstatistics, 1:252–285

Lopes, M. C. S. (2004). Mineração de dados textuais utilizando técnicas de clustering para o idioma português.Rio de Janeiro: sn

Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text.Journal of artificial intelligence research, 30:457–500

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space.arXiv preprint arXiv:1301.3781.

Newberry, C. (2021). 36 twitter stats all marketers need to know in 2021. Acessado em12/04/2021

Noguchi, K., Gohm, C. L., and Dalsky, D. (2006). Cognitive tendencies of focusing on positive and negative information

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau,D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830

Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001.Mahway: Lawrence Erlbaum Associates, 71(2001):2001

Quercia, D., Kosinski, M., Stillwell, D., and Crowcroft, J. (2011). Our twitter profiles, ourselves: Predicting personality with twitter. In2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, pages 180–185. IEEE

Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT Press

Refaeilzadeh, P., Tang, L., and Liu, H. (2009).Cross-Validation, pages 532–538. Springer US, Boston, MA

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M.,Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., et al. (2013).Personality, gender, and age in the language of social media: The open-vocabulary approach.PloS one, 8(9):e73791

Seidman, G. (2013). Self-presentation and belonging on facebook: How personality influences social media use and motivations.Personality and individual differences,54(3):402–407.

Shevade, S., Keerthi, S., Bhattacharyya, C., and Murthy, K. (1999). Improvements to the smo algorithm for svm regression. In IEEE Transactions on Neural Networks.

Tankovska, H. (2021). Social media - statistics & facts. Acessado em 12/04/2021.

Witten, I. H., Frank, E., Hall, M., and Pal, C. (2016). The weka workbench. online appendix for “data mining: Practical machine learning tools and techniques”. In Morgan Kaufmann. Fourth Edition, 2016.
Published
2021-07-18
OLIVEIRA, Arthur Pereira de; SERUFFO, Marcos César da Rocha. Personality classification on Twitter: An analysis on multicultural learning transfer feasibility. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 10. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 67-78. ISSN 2595-6094. DOI: https://doi.org/10.5753/brasnam.2021.16126.

Most read articles by the same author(s)

<< < 1 2