Discovering user traffic profiles: an unsupervised approach

  • Ananda G. Streit UFRJ
  • Rosa M. M. Leão UFRJ
  • Edmundo de Souza e Silva UFRJ
  • Daniel S. Menasché UFRJ

Abstract


The increasing complexity of home networks calls for novel strategies towards efficient network management and workload characterization. In this work we use unsupervised machine learning techniques with the objective of discovering users’ traffic profiles. In partnership with an ISP we collected the download and upload traffic from more than 2,000 home routers of the ISP clients. We then use a tensor decomposition technique (PARAFAC) to extract relevant features from our network traces. With the results of PARAFAC and a hierarchical clustering algorithm, we group users with similar daily traffic patterns. To characterize users’ behavior over periods longer than a day, we use the information of the clusters and a Hidden Markov Model.

Keywords: Network Management, Machine Learning, Monitoring, Data Analysis

References

Bro, R. (1997). Parafac. tutorial and applications. Chemometrics and intelligent laboratory systems, 38(2):149–171.

Crovella, M. and Krishnamurthy, B. (2006). Internet measurement: infrastructure, traffic and applications. John Wiley & Sons, Inc.

Fumo, A., Fiore, M., and Stanica, R. (2017). Joint spatial and temporal classification of mobile traffic demands. In INFOCOM, pages 1–9. IEEE.

Harshman, R. A. (1984). ”how can i know if it’s real?”a catalogue of diagnostics for use with three-mode factor analysis and multidimensional scaling. Research methods for multimode data analysis, pages 566–591.

Harshman, R. A. and Lundy, M. E. (1984). The parafac model for three-way factor analysis and multidimensional scaling. Research methods for multimode data analysis, 46:122–215.

Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3):187–200.

Kim, J., Hwang, J., and Kim, K. (2016). High-performance internet traffic classification using a markov model and kullback-leibler divergence. Mobile Information Systems, 2016.

Kroonenberg, P. M. (1983). Three-mode principal component analysis: Theory and applications, volume 2. DSWO press.

Kruskal, J. (1983). Multilinear methods. In Proc. Symp. Appl. Math, volume 28, page 75.

Legendre, P. and Legendre, L. (2012). Numerical ecology. 3rd. Elsevier.

Lorenzo-Seva, U. and Ten Berge, J. M. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2(2):57–64.

Morichetta, A. and Mellia, M. (2018). Lenta: Longitudinal exploration for network traffic analysis. In ITC.

Nguyen, T. T. and Armitage, G. (2008). A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys & Tutorials, 10(4):56– 76.

Rabanser, S., Shchur, O., and G¨unnemann, S. (2017). Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781.

Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286.

Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., and Faloutsos, C. (2017). Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13):3551–3582.

Smilde, A., Bro, R., and Geladi, P. (2005). Multi-way analysis: applications in the chemical sciences. John Wiley & Sons.

Soysal, M. and Schmidt, E. G. (2010). Machine learning algorithms for accurate flowbased network traffic classification: Evaluation and comparison. Performance Evaluation, 67(6):451–467.

Stedmon, C. A. and Bro, R. (2008). Characterizing dissolved organic matter fluorescence with parallel factor analysis: a tutorial. Limnology and Oceanography: Methods, 6(11):572–579.

Trevisan, M., Giordano, D., Drago, I., Mellia, M., and Munafo, M. (2018). Five years at the edge: Watching internet from the isp network. In Proceedings of the 14th International
Conference on Emerging Networking EXperiments and Technologies, CoNEXT ’18, pages 1–12.

Wright, C., Monrose, F., and Masson, G. M. (2004). Hmm profiles for network traffic classification. In Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, pages 9–15. ACM.
Published
2019-05-06
STREIT, Ananda G.; LEÃO, Rosa M. M.; DE SOUZA E SILVA, Edmundo; MENASCHÉ, Daniel S.. Discovering user traffic profiles: an unsupervised approach. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 37. , 2019, Gramado. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 169-182. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2019.7358.

Most read articles by the same author(s)

1 2 > >>