Symbolic Flow Representation Based on the First-M Packets for Early Traffic Classification

  • Marcelo A. C. Fernandes UFRN

Resumo


This paper presents a symbolic flow-level representation for early network traffic classification based on the first M packets of a bidirectional flow. Each packet payload is converted into a symbolic token sequence and embedded using a transformer-based sentence embedding model, followed by flow-level aggregation. The resulting embeddings are evaluated using XGBoost for binary VPN mode classification and multiclass traffic-type classification. Experiments are conducted on the ISCX VPN-nonVPN dataset using repeated balanced holdout validation. Results show that the proposed representation enables accurate discrimination between VPN and non-VPN traffic and supports traffic-type identification under both VPN and non-VPN settings.

Referências

Aceto, G., Ciuonzo, D., Montieri, A., and Pescapé, A. (2018). Mobile encrypted traffic classification using deep learning. In 2018 Network traffic measurement and analysis conference (TMA), pages 1–8. IEEE.

Azab, A., Khasawneh, M., Alrabaee, S., Choo, K.-K. R., and Sarsour, M. (2024). Network traffic classification: Techniques, datasets, and challenges. Digital Communications and Networks, 10(3):676–692.

Dong, W., Yu, J., Lin, X., Gou, G., and Xiong, G. (2025). Deep learning and pre-training technology for encrypted traffic classification: A comprehensive review. Neurocomputing, 617:128444.

Eslami, E. and Hamouda, W. (2025). Network traffic classification using self-supervised learning and confident learning. IEEE Open Journal of the Communications Society.

Fernandes, M. (2026). SFR-M flow token dataset and embeddings (ISCX VPN-nonVPN). Mendeley Data, V1. DOI: 10.17632/wc48j3hn7w.1.

Gil, G. D., Lashkari, A. H., Mamun, M., and Ghorbani, A. A. (2016). Characterization of encrypted and vpn traffic using time-related features. In Proceedings of the 2nd international conference on information systems security and privacy (ICISSP 2016), pages 407–414. SciTePress Setúbal, Portugal.

Jorgensen, S., Holodnak, J., Dempsey, J., de Souza, K., Raghunath, A., Rivet, V., De-Moes, N., Alejos, A., and Wollaber, A. (2023). Extensible machine learning for encrypted network traffic application labeling via uncertainty quantification. IEEE Transactions on Artificial Intelligence, 5(1):420–433.

Lin, X., Xiong, G., Gou, G., Li, Z., Shi, J., and Yu, J. (2022). Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. In Proceedings of the ACM Web Conference 2022, pages 633–642.

Lotfollahi, M., Jafari Siavoshani, M., Shirali Hossein Zade, R., and Saberian, M. (2020). Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Computing, 24(3):1999–2012.

Najm, I. A., Saeed, A. H., Ahmad, B., Ahmed, S. R., Sekhar, R., Shah, P., and Veena, B. (2024). Enhanced network traffic classification with machine learning algorithms. In Proceedings of the cognitive models and artificial intelligence conference, pages 322–327.

Nascita, A., Aceto, G., Ciuonzo, D., Montieri, A., Persico, V., and Pescapé, A. (2024). A survey on explainable artificial intelligence for internet traffic classification and prediction, and intrusion detection. IEEE Communications Surveys & Tutorials.

Salau, A. O. and Beyene, M. M. (2024). Software defined networking based network traffic classification using machine learning techniques. Scientific Reports, 14(1):20060.
Publicado
25/05/2026
FERNANDES, Marcelo A. C.. Symbolic Flow Representation Based on the First-M Packets for Early Traffic Classification. In: SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 44. , 2026, Praia do Forte/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 1317-1330. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2026.19409.

Artigos mais lidos do(s) mesmo(s) autor(es)