Inferência de Sexo e Idade de Usuários no Twitter
Resumo
As redes sociais online têm atraído um grande público de usuários, que utilizam suas ferramentas para a discussão dos mais variados assuntos. Diversos trabalhos já foram realizados para identificar os temas discutidos nas redes, e um número crescente de trabalhos focando nas características pessoais dos usuários que participam dessas discussões vem sendo desenvolvidos. Este trabalho se propõe a inferir o sexo e idade de usuários da rede Twitter. Alguns trabalhos já se propuseram a fazer o mesmo, mas não foram encontrados estudos que se concentram no público que utiliza a língua portuguesa em suas mensagens. Os métodos para inferência do sexo e idade desenvolvidos neste trabalho alcançaram acurácias aproximadas de 90% e 80%, respectivamente.
Referências
Cataldi, M., Di Caro, L., and Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In Int. Workshop on Multimedia Data Mining, MDMKDD ’10, pages 4:1–4:10.
Cheng, Z., Caverlee, J., and Lee, K. (2010). You are where you tweet: a content-based approach to geo-locating twitter users. In Int. Conf. on Information and knowledge management, CIKM ’10, pages 759–768.
Gonçalves, P., Araújo, M., Benevenuto, F., and Cha, M. (2013). Comparing and combining sentiment analysis methods. In Conf. Online Social Networks, COSN ’13, pages 27–38.
Goswami, S., Sarkar, S., and Rustagi, M. (2009). Stylometric analysis of bloggers’ age and gender. In Int. Conf. on Weblogs and Social Media, pages 214–217.
Gundecha, P., Barbier, G., and Liu, H. (2011). Exploiting vulnerability to secure user privacy on a social networking site. In Int. Conf. on Knowledge Discovery and Data Mining, KDD ’11, pages 511–519.
Hu, X., Tang, J., Gao, H., and Liu, H. (2013). Unsupervised sentiment analysis with emotional signals. In Int. Conf. on World Wide Web, WWW ’13, pages 607–618.
Lin, J., Snow, R., and Morgan, W. (2011). Smoothing techniques for adaptive online language models: topic tracking in tweet streams. In Int. Conf. on Knowledge discovery and data mining, KDD ’11, pages 422–429.
Mahmud, J., Nichols, J., and Drews, C. (2012). Where is this tweet from? inferring home locations of twitter users. In Int. Conf. on Weblogs and Social Media, pages 511–514.
Nguyen, D., Gravel, R., Trieschnigg, D., and Meder, T. (2013). “How old do you think i am?”: A study of language and age in twitter. In Int. Conf. on Weblogs and Social Media, ICWSM 2013, pages 439–448.
Peersman, C., Daelemans,W., and Van Vaerenbergh, L. (2011). Predicting age and gender in online social networks. In Int. workshop on Search and mining user-generated contents, SMUC ’11, pages 37–44.
Rao, D., Yarowsky, D., Shreevaü Schler, J., Koppel, M., Argamon, S., and Pennebaker, J. (2006). Effects of age and gender on blogging. Symposium on Computational Approaches for Analyzing Weblogs, pages 199–205.
Tumasjan, A., Sprenger, T., Sandner, P., and Welpe, I. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. In Int. Conf. on Weblogs and Social Media, pages 178–185.
Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Association for Computational Linguistics, ACL ’02, pages 417–424.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2nd edition.
Yang, J. and Leskovec, J. (2011). Patterns of temporal variation in online media. In Int. Conf. on Web search and data mining, WSDM ’11, pages 177–186.