A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

Esteban Zuñiga; Liang Zhao

doi:10.5753/eniac.2020.12128

Esteban Zuñiga Universidade de São Paulo
Liang Zhao Universidade de São Paulo

DOI: https://doi.org/10.5753/eniac.2020.12128

Resumo

Data classification is a major machine learning paradigm, which has been widely applied to solve a large number of real-world problems. Traditional data classification techniques consider only physical features (e.g., distance, similarity, or distribution) of the input data. For this reason, those are called low-level classification. On the other hand, the human (animal) brain performs both low and high orders of learning, and it has a facility in identifying pat-terns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is referred to as high-level classification. Several high-level classification techniques have been developed, which make use of complex networks to characterize data patterns and have obtained promising results. In this paper, we propose a pure network-based high-level classification technique that uses the betweenness centrality measure. We test this model in nine different real datasets and compare it with other nine traditional and well-known classification models. The results show us a competent classification performance.

Palavras-chave: Machine Learning, Data Mining, Data Science

Referências

Albert, R. and Barabási, A.-L. (2002). Statistical mechanics of complex networks. Rev. Mod. Phys., 74:47–97.

Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.

Carneiro, M. and Zhao, L. (2018). Organizational data classification based on the importance concept of complex networks. IEEE Transactions on Neural Networks and Learning Systems, 29:3361–3373.

Christiano Silva, T. and Zhao, L. (2016). Machine Learning in Complex Networks. Springer International Publishing.

Colliri, T., Ji, D., Pan, H., and Zhao, L. (2018). A network-based high level data classification technique. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8.

Dua, D. and Graff, C. (2017). UCI machine learning repository.

Fadaee, S. A. and Haeri, M. A. (2019). Classification using link prediction. Neurocomputing, 359:395 – 407.

Freeman, L. C. (1977). A set of measures of centrality based upon betweenness. Sociometry, 40:35–41.

Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Cocepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc., 1st edition.

Murphy, K. P. (2013). Machine learning : a probabilistic perspective. MIT Press, Cambridge, Mass. [u.a.].

Needham, M. and Hodler, A. (2019). Graph Algorithms: Practical Examples in Apache Spark and Neo4j. O’Reilly Media, Incorporated.

Newman, M. E. J. (2003). Mixing patterns in networks. Phys. Rev. E, 67(2):026126.

Patel, A. A. (2019). Hands-on unsupervised learning using python.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Riedmiller, M. and Braun, H. (1993). A direct adaptive method for faster backpropagation learning: the rprop algorithm. In IEEE International Conference on Neural Networks, pages 586–591 vol.1.

Shafer, J., Agrawal, R., and Mehta, M. (2000). Sprint: A scalable parallel classifier for data mining. VLDB.

Silva, T. C. and Zhao, L. (2012). Network-based high level data classification. IEEE Transactions on Neural Networks and Learning Systems, 23(6):954–970.

Silva, T. C. and Zhao, L. (2015). High-level pattern-based classification via tourist walks in networks. Information Sciences, 294:109 – 126. Innovative Applications of Artificial Neural Networks in Engineering.