Usando Redes Neurais para Reconstruir Traços de Sessões de Usuários de Sistemas de Larga Escala
Abstract
Monitoring online presence of entities in distributed systems is essential to understand the behavior of such systems and to simulate their dynamics, among others. In many systems, the online presence of entities can be sampled – at regular time intervals – of currently online entities. Examples include online users in distributed apps, or active nodes on the internet. The monitoring process may be flawed, though, and some entities may not appear as online in one or more lists, thus compromising the accuracy of the data collected. Previous investigations have applied statistical methods to identify the occurrence of such failures, and used thresholds to correct them. In this paper, we investigate the potential of machine learning methods to regenerate monitoring data collected via sampling. In particular, we assessed the potential for correcting data using deep learning, and showed that the accuracy, precision and recall can be substantially improved compared to existing statistical methods.
References
Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F., and Caicedo, O. M. (2018). A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. Journal of Internet Services and Applications, 9(1):16.
Cheng, L., Niu, J., Kong, L., Luo, C., Gu, Y., He, W., and Das, S. K. (2017). Compressive sensing based data quality improvement for crowd-sensing applications. Journal of Network and Computer Applications, 77:123 – 134.
Cordeiro, W., Gaspary, L., Beltran, R., Paim, K., and Mansilha, R. (2021). Revisiting the coupon collector’s problem to unveil users’ online sessions in networked systems. Peer-to-Peer Networking and Applications.
Cordeiro, W., Mansilha, R. B., Santos, F. R., Gaspary, L. P., and Barcellos, M. P. (2014). Were you there? bridging the gap to unveil users’ online sessions in networked, distributed systems. In 2014 Brazilian Symposium on Computer Networks and Distributed Systems, pages 239–248.
Emami, M., Akbari, R., Javidan, R., and Zamani, A. (2019). A new approach for traffic matrix estimation in high load computer networks based on graph embedding and convolutional neural network. Transactions on Emerging Telecommunications Technologies, 30(6):e3604. e3604 ETT-18-0390.R2.
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River, NJ. 2nd edition.
Hoßfeld, T., Lehrieder, F., Hock, D., Oechsner, S., Despotovic, Z., Kellerer, W., and Michel, M. (2011). Characterization of BitTorrent swarms and their distribution in the Internet. Computer Networks, 55(5):1197–1215.
Junior, N. A. A., Cordeiro, W. L. d. C., and Gaspary, L. P. (2018). Permitindo Maior Reprodutibilidade de Experimentos em Ambientes Distribuídos com Nodos de Baixa Confiabilidades. In 36º Simpósio Brasileiro de Redes de Computadores e de Sistemas Distribuídos (SBRC 2018), pages 1–14.
Kingma, D. P. and Ba, J. (2015). Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y., editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
Lareida, A., Hoßfeld, T., and Stiller, B. (2017). The bittorrent peer collector problem. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pages 449–455. IEEE.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.
Mansilha, R. B., Mezzomo, A., Facchini, G., Gaspary, L. P., and Barcellos, M. P. (2010). Observando o universo bittorrent através de telescópios. In 28 Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, SBRC 2010, Porto Alegre, RS. SBC.
Mayer, J., Sahakian, V., Hooft, E., Toomey, D., and Durairajan, R. (2021). On the resilience of internet infrastructures in pacific northwest to earthquakes. In Passive and Active Measurement, pages 247–265, Cham. Springer International Publishing.
Padmanabhan, R., Schulman, A., Levin, D., and Spring, N. (2019). Residential links under the weather. In ACM Special Interest Group on Data Communication, SIGCOMM ’19, page 145–158, New York, NY, USA. ACM.
Paim, K. O., Beltran, R. D., Mansilha, R. B., and Cordeiro, W. (2021). GitHub Correcting Datasets With DL SBRC21 repo. Available: [link].
Roughan, M., Thorup, M., and Zhang, Y. (2003). Traffic engineering with estimated traffic matrices. IMC ’03, page 248–258, New York, NY, USA. ACM.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. 15(1):1929–1958.
Xie, K., Li, X., Wang, X., Xie, G., Wen, J., and Zhang, D. (2018). Graph based tensor recovery for accurate internet anomaly detection. In IEEE INFOCOM 2018 The 37th Annual IEEE Conference on Computer Communications, pages 1502–1510.
Xie, K., Wang, X., Wang, X., Chen, Y., Xie, G., Ouyang, Y., Wen, J., Cao, J., and Zhang, D. (2019). Accurate recovery of missing network measurement data with localized tensor completion. IEEE/ACM Transactions on Networking, 27(6):2222–2235.
Zhang, C., Dhungel, P., Wu, D., and Ross, K. W. (2011). Unraveling the bittorrent ecosystem. IEEE Transactions on Parallel and Distributed Systems, 22(7):1164–1177.
Zhou, H., Tan, L., Zeng, Q., and Wu, C. (2016). Traffic matrix estimation: A neural network approach with extended input and expectation maximization iteration. Journal of Network and Computer Applications, 60:220 – 232.
