Personality classification on Twitter: An analysis on multicultural learning transfer feasibility
Abstract
Extracting information about personality is a research object for studies with several applications, such as recommendation systems, recruitment processes, among others. This research field has well-established and efficient resources and methodologies, but with a high concentration of applications in data from the english language. This article contributes to the personality classification for languages that have a smaller number of resources, leveraging the use of techniques aimed at the english language, thus, allowing advances in personality classification technologies created for english-speaking audiences to reflect the advancement of other languages and cultures. For this, techniques of Data Mining, Natural Language Processing and Machine Learning with Word Embedding are used, exploring the correlation between personality traits and textual lexical properties, with data from Online Social Networks. The results obtained are satisfactory when compared with related research applied only in the English language, thus demonstrating the feasibility of wider use of techniques previously aimed at a single culture, indicating the possibility of overcoming the multicultural barrier present in the literature.
Keywords:
Personality Classification, Machine Learning, Natural Language Processing, Word Embedding
References
Arnoux, P.-H., Xu, A., Boyette, N., Mahmud, J., Akkiraju, R., and Sinha, V. (2017). 25 tweets to know you: A new model to predict personality with social media. In Proceedings of the International AAAI Conference on Web and Social Media, volume 11.
Bayram, N. and Aydemir, M. (2017). Decision-making styles and personality traits.
Bird, S., Klein, E., and Loper, E. (2009).Natural Language Processing with Python.O’Reilly Media
Carducci, G., Rizzo, G., Monti, D., Palumbo, E., and Morisio, M. (2018). Twitpersonality: Computing personality traits from tweets using word embeddings and supervised learning.Information, 9(5):127
Celli, F., Pianesi, F., Stillwell, D., and Kosinski, M. (2013). Workshop on computational personality recognition: Shared task. InProceedings of the International AAAI Conference on Web and Social Media, volume 7.
Coltheart, M. (1981). The mrc psycholinguistic database.The Quarterly Journal of Experimental Psychology Section A, 33(4):497–505
Costa, P. and Mccrae, R. (1992). Neo pi-r professional manual.Psychological Assessment Resources, 396.
dos Santos, W. R., Ramos, R. M., and Paraboni, I. (2019). Computational personality recognition from facebook text: psycholinguistic features, words and facets.New Review of Hypermedia and Multimedia, 25(4):268–287
Enge, j. M., G ̊asodden, G., Morten Amundsen, O., Sundt, P. A., and Moxnes, M.(2020). Big five personality test.https://bigfive-test.com/. Acessado em 01/12/2020
Goldberg, L. R. (1990). An alternative ”description of personality”: the big-five factor structure.Journal of personality and social psychology, 59 6:1216–29
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. InProceedings of the International Conference on Language Resources and Evaluation (LREC 2018)
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines.IEEE Intelligent Systems and their Applications, 13(4):18–28
acobelli, F., Gill, A. J., Nowson, S., and Oberlander, J. (2011). Large scale personality classification of bloggers. Ininternational conference on affective computing and intelligent interaction, pages 568–577. Springer
John, O. P., Srivastava, S., et al. (1999). The big five trait taxonomy: History, measurement, and theoretical perspectives.Handbook of personality: Theory and research,2(1999):102–138.
Johnson, J. A. (2014). Measuring thirty facets of the five factor model with a 120 itempublic domain inventory: Development of the ipip-neo-120.Journal of Research in Personality, 51:78–89.
Kenney, J. F. and Keeping, E. (1962). Linear regression and correlation.Mathematics ofstatistics, 1:252–285
Lopes, M. C. S. (2004). Mineração de dados textuais utilizando técnicas de clustering para o idioma português.Rio de Janeiro: sn
Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text.Journal of artificial intelligence research, 30:457–500
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space.arXiv preprint arXiv:1301.3781.
Newberry, C. (2021). 36 twitter stats all marketers need to know in 2021. Acessado em12/04/2021
Noguchi, K., Gohm, C. L., and Dalsky, D. (2006). Cognitive tendencies of focusing on positive and negative information
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau,D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001.Mahway: Lawrence Erlbaum Associates, 71(2001):2001
Quercia, D., Kosinski, M., Stillwell, D., and Crowcroft, J. (2011). Our twitter profiles, ourselves: Predicting personality with twitter. In2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, pages 180–185. IEEE
Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT Press
Refaeilzadeh, P., Tang, L., and Liu, H. (2009).Cross-Validation, pages 532–538. Springer US, Boston, MA
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M.,Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., et al. (2013).Personality, gender, and age in the language of social media: The open-vocabulary approach.PloS one, 8(9):e73791
Seidman, G. (2013). Self-presentation and belonging on facebook: How personality influences social media use and motivations.Personality and individual differences,54(3):402–407.
Shevade, S., Keerthi, S., Bhattacharyya, C., and Murthy, K. (1999). Improvements to the smo algorithm for svm regression. In IEEE Transactions on Neural Networks.
Tankovska, H. (2021). Social media - statistics & facts. Acessado em 12/04/2021.
Witten, I. H., Frank, E., Hall, M., and Pal, C. (2016). The weka workbench. online appendix for “data mining: Practical machine learning tools and techniques”. In Morgan Kaufmann. Fourth Edition, 2016.
Bayram, N. and Aydemir, M. (2017). Decision-making styles and personality traits.
Bird, S., Klein, E., and Loper, E. (2009).Natural Language Processing with Python.O’Reilly Media
Carducci, G., Rizzo, G., Monti, D., Palumbo, E., and Morisio, M. (2018). Twitpersonality: Computing personality traits from tweets using word embeddings and supervised learning.Information, 9(5):127
Celli, F., Pianesi, F., Stillwell, D., and Kosinski, M. (2013). Workshop on computational personality recognition: Shared task. InProceedings of the International AAAI Conference on Web and Social Media, volume 7.
Coltheart, M. (1981). The mrc psycholinguistic database.The Quarterly Journal of Experimental Psychology Section A, 33(4):497–505
Costa, P. and Mccrae, R. (1992). Neo pi-r professional manual.Psychological Assessment Resources, 396.
dos Santos, W. R., Ramos, R. M., and Paraboni, I. (2019). Computational personality recognition from facebook text: psycholinguistic features, words and facets.New Review of Hypermedia and Multimedia, 25(4):268–287
Enge, j. M., G ̊asodden, G., Morten Amundsen, O., Sundt, P. A., and Moxnes, M.(2020). Big five personality test.https://bigfive-test.com/. Acessado em 01/12/2020
Goldberg, L. R. (1990). An alternative ”description of personality”: the big-five factor structure.Journal of personality and social psychology, 59 6:1216–29
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. InProceedings of the International Conference on Language Resources and Evaluation (LREC 2018)
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines.IEEE Intelligent Systems and their Applications, 13(4):18–28
acobelli, F., Gill, A. J., Nowson, S., and Oberlander, J. (2011). Large scale personality classification of bloggers. Ininternational conference on affective computing and intelligent interaction, pages 568–577. Springer
John, O. P., Srivastava, S., et al. (1999). The big five trait taxonomy: History, measurement, and theoretical perspectives.Handbook of personality: Theory and research,2(1999):102–138.
Johnson, J. A. (2014). Measuring thirty facets of the five factor model with a 120 itempublic domain inventory: Development of the ipip-neo-120.Journal of Research in Personality, 51:78–89.
Kenney, J. F. and Keeping, E. (1962). Linear regression and correlation.Mathematics ofstatistics, 1:252–285
Lopes, M. C. S. (2004). Mineração de dados textuais utilizando técnicas de clustering para o idioma português.Rio de Janeiro: sn
Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text.Journal of artificial intelligence research, 30:457–500
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space.arXiv preprint arXiv:1301.3781.
Newberry, C. (2021). 36 twitter stats all marketers need to know in 2021. Acessado em12/04/2021
Noguchi, K., Gohm, C. L., and Dalsky, D. (2006). Cognitive tendencies of focusing on positive and negative information
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau,D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001.Mahway: Lawrence Erlbaum Associates, 71(2001):2001
Quercia, D., Kosinski, M., Stillwell, D., and Crowcroft, J. (2011). Our twitter profiles, ourselves: Predicting personality with twitter. In2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, pages 180–185. IEEE
Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT Press
Refaeilzadeh, P., Tang, L., and Liu, H. (2009).Cross-Validation, pages 532–538. Springer US, Boston, MA
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M.,Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., et al. (2013).Personality, gender, and age in the language of social media: The open-vocabulary approach.PloS one, 8(9):e73791
Seidman, G. (2013). Self-presentation and belonging on facebook: How personality influences social media use and motivations.Personality and individual differences,54(3):402–407.
Shevade, S., Keerthi, S., Bhattacharyya, C., and Murthy, K. (1999). Improvements to the smo algorithm for svm regression. In IEEE Transactions on Neural Networks.
Tankovska, H. (2021). Social media - statistics & facts. Acessado em 12/04/2021.
Witten, I. H., Frank, E., Hall, M., and Pal, C. (2016). The weka workbench. online appendix for “data mining: Practical machine learning tools and techniques”. In Morgan Kaufmann. Fourth Edition, 2016.
Published
2021-07-18
How to Cite
OLIVEIRA, Arthur Pereira de; SERUFFO, Marcos César da Rocha.
Personality classification on Twitter: An analysis on multicultural learning transfer feasibility. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 10. , 2021, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 67-78.
ISSN 2595-6094.
DOI: https://doi.org/10.5753/brasnam.2021.16126.
