Um Sistema Paralelo para Predizer Informações de Usuários em Redes Sociais
Resumo
Neste trabalho é proposto um método para fins de classificação de informações obtidas em redes sociais por meio de um classificador de estágio múltiplo. Esse classificador, estruturado em dois níveis, utiliza dados obtidos em redes sociais para estimar informações de um usuário de acordo com um critério de classificação. No caso, o critério de informação escolhido e investigado foi a idade, embora o método possa ser facilmente adaptado para estimar outros tipos de informações. O classificador utiliza a distância de Bhattacharyya e a divergência de Kullback-Leiber para relacionar informações coletadas em redes sociais com as informações inseridas para um usuário que se deseja estimar a idade. Como esse tipo de aplicação envolve um grande volume de dados, neste trabalho também é apresentado a estratégia para distribuição e computação dos dados utilizando o método proposto.
Palavras-chave:
Distributed systems, Information prediction, Bhattacharyya distance, Kullback–Leibler divergence, Classification model
Referências
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee, “Measurement and analysis of online social networks,” in Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, ser. IMC ’07. New York, NY, USA: ACM, 2007, pp. 29–42. [Online]. Available: http://doi.acm.org/10.1145/1298306.1298311
HubSpot. (2013, Jun.) State of the twittersphere. [Online]. Available: http://bit.ly/sotwitter/
TheBritishChamber. (2013, Jun.) The growing influence of China’s Weibo. [Online]. Available: http://britishchamber.cn/content/growing-influence-china\T1\textquoterights-weibo
R. Campbell, C. Martin, and B. Fabos, Media and Culture: An Introduction to Mass Communication. Bedford/St. Martin’s, 2011.
M. W. Bauer, “Classical content analysis: A review,” Qualitative researching with text, image and sound, pp. 131–151, 2000.
C. H. Lau, Y. Li, and D. Tjondronegoro, “Microblog retrieval using topical features and query expansion,” in TREC, E. M. Voorhees and L. P. Buckland, Eds. National Institute of Standards and Technology (NIST), 2011.
W. Hua, T. D. Huynh, S. Hosseini, J. Lu, and X. Zhou, “Information extraction from microblogs: A survey,” International Journal of Software and Informatics, vol. 6, no. 4, pp. 495–522, 2012.
Twitalyze. (2013, May) Twitalyze. [Online]. Available: http://www.twitalyzer.com/
Ttweetstats. (2013, May) Ttweetstats. [Online]. Available: http://www.tweetstats.com/
B. statistics. (2013, May) Brandtweet statistics. [Online]. Available: http://stats.brandtweet.com/
S. Argamon, M. Koppel, J. W. Pennebaker, and J. Schler, “Mining the blogosphere: Age, gender and the varieties of self-expression,” First Monday, vol. 12, no. 9, 2007.
C. J. van Heerden, E. Barnard, M. H. Davel, C. van der Walt, E. van Dyk, M. Feld, and C. A. Müller, “Combining regression and classification methods for improving automatic speaker age recognition,” in ICASSP. IEEE, 2010, pp. 5174–5177.
R. Dey, C. Tang, K. W. Ross, and N. Saxena, “Estimating age privacy leakage in online social networks,” in INFOCOM, A. G. Greenberg and K. Sohraby, Eds. IEEE, 2012, pp. 2836–2840.
J. D. Burger, J. Henderson, G. Kim, and G. Zarrella, “Discriminating gender on twitter,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, ser. EMNLP ’11. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011, pp. 1301–1309. [Online]. Available: http://dl.acm.org/citation.cfm?id=2145432.2145568
G. Hjorth, “Classification problems in mathematics,” University of California, Berkeley, 2010.
E. Choi and C. Lee, “Feature extraction based on the bhattacharyya distance.” Pattern Recognition, vol. 36, no. 8, pp. 1703–1709, 2003. [Online]. Available: http://dblp.uni-trier.de/db/journals/pr/pr36.html#ChoiL03
S. Kullback and R. A. Leibler, “On information and sufficiency,” Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, 1951.
R. T. Fielding, “REST: architectural styles and the design of network-based software architectures,” Doctoral dissertation, University of California, Irvine, 2000. [Online]. Available: http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
D. Crockford, “Rfc4627: Javascript object notation,” 2006.
R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
HubSpot. (2013, Jun.) State of the twittersphere. [Online]. Available: http://bit.ly/sotwitter/
TheBritishChamber. (2013, Jun.) The growing influence of China’s Weibo. [Online]. Available: http://britishchamber.cn/content/growing-influence-china\T1\textquoterights-weibo
R. Campbell, C. Martin, and B. Fabos, Media and Culture: An Introduction to Mass Communication. Bedford/St. Martin’s, 2011.
M. W. Bauer, “Classical content analysis: A review,” Qualitative researching with text, image and sound, pp. 131–151, 2000.
C. H. Lau, Y. Li, and D. Tjondronegoro, “Microblog retrieval using topical features and query expansion,” in TREC, E. M. Voorhees and L. P. Buckland, Eds. National Institute of Standards and Technology (NIST), 2011.
W. Hua, T. D. Huynh, S. Hosseini, J. Lu, and X. Zhou, “Information extraction from microblogs: A survey,” International Journal of Software and Informatics, vol. 6, no. 4, pp. 495–522, 2012.
Twitalyze. (2013, May) Twitalyze. [Online]. Available: http://www.twitalyzer.com/
Ttweetstats. (2013, May) Ttweetstats. [Online]. Available: http://www.tweetstats.com/
B. statistics. (2013, May) Brandtweet statistics. [Online]. Available: http://stats.brandtweet.com/
S. Argamon, M. Koppel, J. W. Pennebaker, and J. Schler, “Mining the blogosphere: Age, gender and the varieties of self-expression,” First Monday, vol. 12, no. 9, 2007.
C. J. van Heerden, E. Barnard, M. H. Davel, C. van der Walt, E. van Dyk, M. Feld, and C. A. Müller, “Combining regression and classification methods for improving automatic speaker age recognition,” in ICASSP. IEEE, 2010, pp. 5174–5177.
R. Dey, C. Tang, K. W. Ross, and N. Saxena, “Estimating age privacy leakage in online social networks,” in INFOCOM, A. G. Greenberg and K. Sohraby, Eds. IEEE, 2012, pp. 2836–2840.
J. D. Burger, J. Henderson, G. Kim, and G. Zarrella, “Discriminating gender on twitter,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, ser. EMNLP ’11. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011, pp. 1301–1309. [Online]. Available: http://dl.acm.org/citation.cfm?id=2145432.2145568
G. Hjorth, “Classification problems in mathematics,” University of California, Berkeley, 2010.
E. Choi and C. Lee, “Feature extraction based on the bhattacharyya distance.” Pattern Recognition, vol. 36, no. 8, pp. 1703–1709, 2003. [Online]. Available: http://dblp.uni-trier.de/db/journals/pr/pr36.html#ChoiL03
S. Kullback and R. A. Leibler, “On information and sufficiency,” Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, 1951.
R. T. Fielding, “REST: architectural styles and the design of network-based software architectures,” Doctoral dissertation, University of California, Irvine, 2000. [Online]. Available: http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
D. Crockford, “Rfc4627: Javascript object notation,” 2006.
R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
Publicado
23/10/2013
Como Citar
FREITAS, Pedro Garcia; SOUZA, Márcio A. Silva; ARAÚJO, Aletéia P. F. de; WEIGANG, Li; FONSECA, Érico Marx P.; FARIAS, Mylène C. Q..
Um Sistema Paralelo para Predizer Informações de Usuários em Redes Sociais. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 14. , 2013, Porto de Galinhas.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2013
.
p. 26-33.
DOI: https://doi.org/10.5753/wscad.2013.16770.