User Classification on Online Social Networks by Post Frequency

  • Gabriel Tavares Universidade Estadual de Londrina
  • Saulo Mastelini Universidade Estadual de Londrina
  • Sylvio Jr. Universidade Estadual de Londrina

Resumo


This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively

Palavras-chave: Redes Sociais Online, Aprendizado de Máquina, Classificação do usuário, Twitter

Referências

A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Transactions on Computer Systems (TOCS), 2(1):39–59, 1984.

B. Bouchra, B. Fouzia, and B. Chahinez. Data sources integration using viewpoint-based approach. In Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication, IPAC ’15, pages 25:1–25:6, New York, NY, USA, 2015. ACM.

G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Concepts and Design. Addison-Wesley Publishing Company, USA, 5th edition, 2011.

C. Date. Locking and recovery in a shared database system: An application programming tutorial. In VLDB, pages 1–15, 1979.

M. Dumas, M. La Rosa, J. Mendling, and H. A. Reijers. Fundamentals of business process management, volume 1. Springer, 2013.

X. Guangbin, W. Jianfeng, Z. Gang, H. Yinfei, W. Shaoping, B. Shuo, et al. Fftp: A file-like data exchange method for high-liquid securities information based on extended fast. In Information and Financial Engineering (ICIFE), 2010 2nd IEEE International Conference on, pages 603–607. IEEE, 2010.

G. Hohpe and B. Woolf. Enterprise integration patterns: Designing, building, and deploying messaging solutions. Addison-Wesley Professional, 2004.

W. Kim, N. Ballou, J. F. Garza, and D. Woelk. A distributed object-oriented database system supporting shared and private databases. ACM Transactions on Information Systems (TOIS), 9(1):31–51, 1991.

M. Lenzerini. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 233–246. ACM, 2002.

M. Magnani and D. Montesi. Bpdmn: A conservative extension of bpmn with enhanced data representation capabilities. arXiv preprint arXiv:0907.1978, 2009.

A. Meyer, L. Pufahl, D. Fahland, and M. Weske. Business Process Management: 11th International Conference, BPM 2013, Beijing, China, August 26-30, 2013. Proceedings, chapter Modeling and Enacting Complex Data Dependencies in Business Processes, pages 171–186. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.

OMG. Business Process Model and Notation (BPMN) version 2.0.2. http://www.omg.org/spec/BPMN/2.0.2/, january 2014.

N. Russell, A. H. Ter Hofstede, D. Edmond, and W. M. van der Aalst. Workflow data patterns. Technical report, QUT Technical report, FIT-TR-2004-01, Queensland University of Technology, Brisbane, 2004.

S. Sadiq, M. Orlowska, W. Sadiq, and C. Foulger. Data flow and validation in workflow modelling. In Proceedings of the 15th Australasian database conference-Volume 27, pages 207–214. Australian Computer Society, Inc., 2004.

Y. Sun, J. Su, B. Wu, and J. Yang. Modeling data for business processes. In 2014 IEEE 30th International Conference on Data Engineering, pages 1048–1059, March 2014.

A. Th¨oni, L. Madlberger, and A. Schatten. Towards a data-integration approach for enterprise sustainability risk information systems. In Proceedings of the 7th International Conference on Research and Practical Issues of Enterprise Information Systems, Linz, 2013.

W. M. P. Van Der Aalst. Getting the data. In Process Mining: Discovery, Conformance and Enhancement of Business Processes, pages 95–123. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.

W3C. Web services activity statement. https://www.w3.org/2002/ws/Activity, 2002.

J. Waldo. Remote procedure calls and java remote method invocation. IEEE Concurrency, 6(3):5–7, Jul 1998.

S. A. White. BPMN modeling and reference guide: understanding and using BPMN. Future Strategies Inc., 2008.
Publicado
17/05/2017
TAVARES, Gabriel; MASTELINI, Saulo; JR., Sylvio. User Classification on Online Social Networks by Post Frequency. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 13. , 2017, Lavras. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 464-471. DOI: https://doi.org/10.5753/sbsi.2017.6076.