Detection of Big Five personality traits in Twitter user's profiles based on textual posts
Resumo
Personalidade descreve o comportamento das pessoas e pode influenciar suas escolhas e tomadas de decisão. Uma das métricas mais consolidadas para traços de personalidade é o Big Five. Poucos trabalhos foram produzidos para detecção dos mesmos a partir de textos em português compartilhados em redes sociais. O objetivo do presente trabalho consiste na construção de um conjunto de dados com tweets em português rotulados com o traço de personalidade dominante e verificar o seu potencial de uso em modelos de aprendizado de máquina clássicos. No experimento realizado, os algoritmos de aprendizado de máquina apresentaram desempenho superior com a inclusão da técnica SMOTE e o melhor resultado foi Regressão Logística com TF-IDF unigram.
Referências
Balakrishnan, V., Khan, S., Fernandez, T., and Arabnia, H. R. (2019). Cyberbullying detection on twitter using big five and dark triad features. Personality and individual differences, 141:252–257.
Bowden-Green, T., Hinds, J., and Joinson, A. (2021). Understanding neuroticism and social media: A systematic review. Personality and Individual Differences, 168:110344.
Buiar, J. A., Pimentel, A. R., Oliveira, L., and da Silva, Z. C. (2019). Modelo computacional para identificação de perfil de personalidade baseado em textos educacionais. Nuevas Ideas en Informática Educativa, 15:30–37.
Bunker, C. J. and Kwan, V. S. (2021). Do the offline and social media big five have the same dimensional structure, mean levels, and predictive validity of social media outcomes? Cyberpsychology: Journal of Psychosocial Research on Cyberspace, 15(4).
Cahyani, D. E. and Faishal, A. F. (2020). Classification of big five personality behavior tendencies based on study field with twitter analysis using support vector machine. In 2020 7th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pages 140–145. IEEE.
Camati, R. S. (2021). Reconhecimento Automático de Personalidade a partir de textos: uma abordagem baseada em técnicas projetivas. PhD thesis, PUC-PR.
Chen, X. and Wang, N. (2020). Rumor spreading model considering rumor credibility, correlation and crowd classification based on personality. Scientific Rep., 10(1):1–15.
Christian, H., Suhartono, D., Chowanda, A., and Zamli, K. Z. (2021). Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging. Journal of Big Data, 8(1):1–20.
da Silva, A. G., Loch, A. A., Leal, V. P., da Silva, P. R., Rosa, M. M., Bomfim, O. d. C., Malloy-Diniz, L. F., Schwarzbold, M. L., Diaz, A. P., and Palha, A. P. (2020). Stigma toward individuals with mental disorders among brazilian psychiatrists: a latent class analysis. Brazilian Journal of Psychiatry, 43:262–268.
Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual review of psychology, 41(1):417–440.
Effrosynidis, D. and Arampatzis, A. (2021). An evaluation of feature selection methods for environmental data. Ecological Informatics, 61:101224.
Ergu, I., Işık, Z., and Yankayış, ̇I. (2019). Predicting personality with twitter data and machine learning models. In 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pages 1–5. IEEE.
Faceli, K., Lorena, A. C., Gama, J., and Carvalho, A. C. P. d. L. F. d. (2011). Inteligência artificial: uma abordagem de aprendizado de máquina. Editora LTC.
Ferreira, T. B. (2018). Raciocínio baseado em casos para apoiar a formação de grupos baseada nos traços de personalidade do modelo big five. Master’s thesis, UFU.
Flack, J. C. and D’Souza, R. M. (2014). The digital age and the future of social network science and engineering. Proceedings of the IEEE, 102(12):1873–1877.
Genina, A., Gawich, M., and Hegazy, A. F. (2020). A survey for sentiment analysis and personality prediction for text analysis. In Internet of Things—Applications and Future, pages 347–356. Springer.
Grandini, M., Bagli, E., and Visani, G. (2020). Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756.
Indira, R. and Maharani, W. (2021). Personality detection on social media twitter using long short-term memory with word2vec. In 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), pages 64–69. IEEE.
Kaushal, V. and Patwardhan, M. (2018). Emerging trends in personality identification using online social networks—a literature survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(2):1–30.
Khan, E. M., Mukta, M. S. H., Ali, M. E., and Mahmud, J. (2020). Predicting users’ movie preference and rating behavior from personality and values. ACM Transactions on Interactive Intelligent Systems (TiiS), 10(3):1–25.
King, R. D., Orhobor, O. I., and Taylor, C. C. (2021). Cross-validation is safe to use. Nature Machine Intelligence, 3(4):276–276.
KN, P. K. and Gavrilova, M. L. (2021). Latent personality traits assessment from social network activity using contextual language embedding. IEEE Transactions on Computational Social Systems, 9(2):638–649.
Kunte, A. V. and Panicker, S. (2019). Using textual data for personality prediction: a machine learning approach. In 2019 4th international conference on information systems and computer networks (ISCON), pages 529–533. IEEE.
Lucky, H., Suhartono, D., et al. (2021). Towards classification of personality prediction model: A combination of bert word embedding and mlsmote. In 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), volume 1, pages 346–350. IEEE.
Majumder, N., Poria, S., Gelbukh, A., and Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems, 32(2):74–79.
Marques Junior, P. R. M. (2018). Detecção de transtorno de personalidade narcisista a partir de tweets: um estudo preliminar. Trabalho de conclusão de curso, UFF.
Moreno, D. R. J., Gomez, J. C., Almanza-Ojeda, D.-L., and Ibarra-Manzano, M.-A. (2019). Prediction of personality traits in twitter users with latent features. In 2019 International Conference on Electronics, Communications and Computers (CONIE-LECOMP), pages 176–181. IEEE.
Mota, F. B. d. S. et al. (2022). Análise de traços de personalidade em ambientes de participação eletrônica. Master’s thesis, Universidade Federal de Itajuba.
Noorbakhsh-Sabet, N., Zand, R., Zhang, Y., and Abedi, V. (2019). Artificial intelligence transforms the future of health care. The American J. of medicine, 132(7):795–801.
Pratama, R. P. and Maharani, W. (2021). Predicting big five personality traits based on twitter user u sing random forest method. In 2021 International Conference on Data Science and Its Applications (ICoDSA), pages 110–117. IEEE.
Ryan, G., Katarina, P., and Suhartono, D. (2023). Mbti personality prediction using machine learning and smote for balancing data based on statement sentences. Information, 14(4):217.
Salem, M. S., Ismail, S. S., and Aref, M. (2019). Personality traits for egyptian twitter users dataset. In Conf. Software and Information Engineering, pages 206–211.
Sangoju, M. K., Garapati, D. S., Mudu, S. N., and Madhuri, C. R. (2022). User personality prediction in rumor propagation across digital footprints. In 2022 8th International Conference on Smart Structures and Systems (ICSSS), pages 1–8. IEEE.
Sekarningtyas, A. S., Ayu, M. A., and Mantoro, T. (2021). Using k-nearest neighbor algorithm for personality classification of twitter’s users based on the big five theory. In 2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED), pages 1–6. IEEE.
Smiderle, R., Rigo, S. J., Marques, L. B., Peçanha de Miranda Coelho, J. A., and Jaques, P. A. (2020). The impact of gamification on students’ learning, engagement and behavior based on their personality traits. Smart Learning Environments, 7(1):1–11.
Statista (2022). Social media and user-generated content - leading countries based on number of twitter users as of january 2022. [link].
Tan, J. S., Tan, I. K., Soon, L. K., and Ong, H. F. (2022). Improved automated essay scoring using gaussian multi-class smote for dataset sampling. In Proceedings of the 15th International Conference on Educational Data Mining, page 647.
Tutaysalgir, E., Karagoz, P., and Toroslu, I. H. (2019). Clustering based personality prediction on turkish tweets. In 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 825–828. IEEE.
Usher, J. and Dondio, P. (2020). Brexit: Psychometric profiling the political salubrious through machine learning: Predicting personality traits of boris johnson through twitter political text. In Conf. Web Intelligence, Mining and Semantics, pages 178–183.
Wardhani, N. W. S., Rochayani, M. Y., Iriany, A., Sulistyono, A. D., and Lestantyo, P. (2019). Cross-validation metrics for evaluating classification performance on imbalanced data. In 2019 In Conf. on computer, control, informatics and its applications (IC3INA), pages 14–18. IEEE.