Enhancing APT Detection with Synthetic Data Generation Based on GAN-Transformers
Abstract
This work investigates the generation of synthetic data for Advanced Persistent Threats (APTs) using Generative Adversarial Networks (GANs) adapted to the domain of time series. Given the stealthy and sequential nature of APTs, traditional data generation methods that ignore temporal dynamics are insufficient. To address this limitation, this study explores the Transformer Time-Series Conditional GAN (TTS-CGAN) architecture, originally proposed for biosignals, and proposes specific adaptations for the generation of malicious network traffic flows. The process includes data modeling from the DAPT2020 dataset, architectural adjustments to enhance capacity and diversity, and validation of the synthetic data through qualitative, quantitative metrics and the performance evaluation of machine learning models trained on real, synthetic, and semi-synthetic datasets. Results indicate that the synthetic data generated by the TTS-CGAN can improve APT detection performance, demonstrating the viability and benefits of the proposed approach.References
Alo, S. O., Jamil, A. S., Hussein, M. J., Al-Dulaimi, M. K. H., Taha, S. W., e Khlaponina, A. (2024). Automated detection of cybersecurity threats using generative adversarial networks (GANs). In 2024 36th Conference of Open Innovations Association (FRUCT), pages 566–577. IEEE.
Alshamrani, A., Myneni, S., Chowdhary, A., e Huang, D. (2019). A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Communications Surveys & Tutorials, 21(2):1851–1877.
Alzahem, A., Boulila, W., Driss, M., Koubaa, A., e Almomani, I. (2022). Towards optimizing malware detection: An approach based on generative adversarial networks and transformers. In Nguyen, N. T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., e Trawiński, B., editors, Computational Collective Intelligence, volume 13501, pages 598–610. Springer International Publishing. Series Title: Lecture Notes in Computer Science.
Bianchi, L., Pregardier, R., Silva, L. A. L., e Santos, C. R. P. (2025). 2pack-gan: Exploring transfer learning to fine-tune generative adversarial networks for network packet generation. In NOMS 2025-2025 IEEE Network Operations and Management Symposium, pages 1–9. IEEE.
Brophy, E., Wang, Z., She, Q., e Ward, T. (2023). Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys, 55(10):1–31.
Chakraborty, T., KS, U. R., Naik, S. M., Panja, M., e Manvitha, B. (2024). Ten years of generative adversarial nets (gans): a survey of the state-of-the-art. Machine Learning: Science and Technology, 5(1):011001.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
Esteban, C., Hyland, S. L., e Rätsch, G. (2017). Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. "O’Reilly Media, Inc.".
Ghafir, I., Hammoudeh, M., Prenosil, V., Han, L., Hegarty, R., Rabie, K., e Aparicio-Navarro, F. J. (2018). Detection of advanced persistent threat using machine-learning correlation analysis. Future Generation Computer Systems, 89:349–359.
Ghafir, I., Kyriakopoulos, K. G., Lambotharan, S., Aparicio-Navarro, F. J., Assadhan, B., Binsalleeh, H., e Diab, D. M. (2019). Hidden markov models and alert correlations for the prediction of advanced persistent threats. IEEE Access, 7:99508–99520.
Harada, S., Hayashi, H., e Uchida, S. (2019). Biosignal generation and latent variable analysis with recurrent generative adversarial networks. IEEE Access, 7:144292–144302.
Hazra, D. e Byun, Y. C. (2020). Synsiggan: Generative adversarial networks for synthetic biomedical signal generation. Biology (Basel), 9(12):441.
Hudson, D. A. e Zitnick, L. (2021). Generative adversarial transformers. In International conference on machine learning, pages 4487–4499. PMLR.
Jiang, Y., Chang, S., e Wang, Z. (2021). TransGAN: two pure transformers can make one strong GAN, and that can scale up. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, pages 14745–14758. Curran Associates Inc.
Kumar, A., Kuppusamy, K., e Aghila, G. (2019). A learning model to detect maliciousness of portable executable using integrated feature set. Journal of King Saud University - Computer and Information Sciences, 31(2):252–265.
Li, D., Chen, D., Goh, J., e Ng, S.-k. (2018). Anomaly detection with generative adversarial networks for multivariate time series. arXiv preprint arXiv:1809.04758.
Li, D., Chen, D., Jin, B., Shi, L., Goh, J., e Ng, S.-K. (2019). Mad-gan: Multivariate anomaly detection for time series data with generative adversarial networks. In International conference on artificial neural networks, pages 703–716. Springer.
Li, X., Metsis, V., Wang, H., e Ngu, A. H. H. (2022). Tts-gan: A transformer-based time-series generative adversarial network. In International conference on artificial intelligence in medicine, pages 133–143. Springer.
Liao, N., Wang, J., Guan, J., e Fan, H. (2024). A multi-step attack identification and correlation method based on multi-information fusion. Computers and Electrical Engineering, 117:109249.
Lippmann, R. P., Fried, D. J., Graf, I., Haines, J. W., Kendall, K. R., McClung, D., Weber, D., Webster, S. E., Wyschogrod, D., Cunningham, R. K., et al. (2000). Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation. In Proceedings DARPA Information survivability conference and exposition. DISCEX’00, volume 2, pages 12–26. IEEE.
Moustafa, N. e Slay, J. (2015). UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), pages 1–6. IEEE.
Myneni, S., Chowdhary, A., Sabur, A., Sengupta, S., Agrawal, G., Huang, D., e Kang, M. (2020). DAPT 2020 - constructing a benchmark dataset for advanced persistent threats. In Wang, G., Ciptadi, A., e Ahmadzadeh, A., editors, Deployable Machine Learning for Security Defense, volume 1271, pages 138–163. Springer International Publishing. Series Title: Communications in Computer and Information Science.
Myneni, S., Jha, K., Sabur, A., Agrawal, G., Deng, Y., Chowdhary, A., e Huang, D. (2023). Unraveled—a semi-synthetic dataset for advanced persistent threats. Computer Networks, 227:109688.
Navarro, J., Deruyver, A., e Parrend, P. (2018). A systematic survey on multi-step attack detection. Computers & Security, 76:214–249.
Navidan, H., Moshiri, P. F., Nabati, M., Shahbazian, R., Ghorashi, S. A., Shah-Mansouri, V., e Windridge, D. (2021). Generative adversarial networks (gans) in networking: A comprehensive survey & evaluation. Computer Networks, 194:108149.
Sharafaldin, I., Habibi Lashkari, A., e Ghorbani, A. A. A detailed analysis of the CICIDS2017 data set. In Mori, P., Furnell, S., e Camp, O., editors, Information Systems Security and Privacy, volume 977, pages 172–188. Springer International Publishing. Series Title: Communications in Computer and Information Science.
Shiravi, A., Shiravi, H., Tavallaee, M., e Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security, 31(3):357–374.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, U., e Polosukhin, I. (2017). Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., Gao, M., Hou, H., e Wang, C. (2018). Machine learning and deep learning methods for cybersecurity. Ieee access, 6:35365–35381.
Xiong, C., Zhu, T., Dong, W., Ruan, L., Yang, R., Cheng, Y., Chen, Y., Cheng, S., e Chen, X. (2020). Conan: A practical real-time apt detection system with high accuracy and efficiency. IEEE Transactions on Dependable and Secure Computing, 19(1):551–565.
Yoon, J., Jarrett, D., e Van der Schaar, M. (2019). Time-series generative adversarial networks. Advances in neural information processing systems, 32.
Zeeshan, M. e Maasooma (2024). Trans-GAN: A deep learning paradigm for multi-type anomaly detection in network traffic. In 2024 International Conference on Frontiers of Information Technology (FIT), pages 1–6. IEEE.
Zhou, P., Zhou, G., Wu, D., e Fei, M. (2021). Detecting multi-stage attacks using sequence-to-sequence model. Computers & Security, 105:102203.
Zhu, F., Ye, F., Fu, Y., et al. (2019a). Electrocardiogram generation with a bidirectional lstm-cnn generative adversarial network. Scientific reports, 9(1):6734.
Zhu, G., Zhao, H., Liu, H., e Sun, H. (2019b). A novel lstm-gan algorithm for time series anomaly detection. In 2019 Prognostics and System Health Management Conference (PHM-Qingdao), pages 1–6.
Alshamrani, A., Myneni, S., Chowdhary, A., e Huang, D. (2019). A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Communications Surveys & Tutorials, 21(2):1851–1877.
Alzahem, A., Boulila, W., Driss, M., Koubaa, A., e Almomani, I. (2022). Towards optimizing malware detection: An approach based on generative adversarial networks and transformers. In Nguyen, N. T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., e Trawiński, B., editors, Computational Collective Intelligence, volume 13501, pages 598–610. Springer International Publishing. Series Title: Lecture Notes in Computer Science.
Bianchi, L., Pregardier, R., Silva, L. A. L., e Santos, C. R. P. (2025). 2pack-gan: Exploring transfer learning to fine-tune generative adversarial networks for network packet generation. In NOMS 2025-2025 IEEE Network Operations and Management Symposium, pages 1–9. IEEE.
Brophy, E., Wang, Z., She, Q., e Ward, T. (2023). Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys, 55(10):1–31.
Chakraborty, T., KS, U. R., Naik, S. M., Panja, M., e Manvitha, B. (2024). Ten years of generative adversarial nets (gans): a survey of the state-of-the-art. Machine Learning: Science and Technology, 5(1):011001.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
Esteban, C., Hyland, S. L., e Rätsch, G. (2017). Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. "O’Reilly Media, Inc.".
Ghafir, I., Hammoudeh, M., Prenosil, V., Han, L., Hegarty, R., Rabie, K., e Aparicio-Navarro, F. J. (2018). Detection of advanced persistent threat using machine-learning correlation analysis. Future Generation Computer Systems, 89:349–359.
Ghafir, I., Kyriakopoulos, K. G., Lambotharan, S., Aparicio-Navarro, F. J., Assadhan, B., Binsalleeh, H., e Diab, D. M. (2019). Hidden markov models and alert correlations for the prediction of advanced persistent threats. IEEE Access, 7:99508–99520.
Harada, S., Hayashi, H., e Uchida, S. (2019). Biosignal generation and latent variable analysis with recurrent generative adversarial networks. IEEE Access, 7:144292–144302.
Hazra, D. e Byun, Y. C. (2020). Synsiggan: Generative adversarial networks for synthetic biomedical signal generation. Biology (Basel), 9(12):441.
Hudson, D. A. e Zitnick, L. (2021). Generative adversarial transformers. In International conference on machine learning, pages 4487–4499. PMLR.
Jiang, Y., Chang, S., e Wang, Z. (2021). TransGAN: two pure transformers can make one strong GAN, and that can scale up. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, pages 14745–14758. Curran Associates Inc.
Kumar, A., Kuppusamy, K., e Aghila, G. (2019). A learning model to detect maliciousness of portable executable using integrated feature set. Journal of King Saud University - Computer and Information Sciences, 31(2):252–265.
Li, D., Chen, D., Goh, J., e Ng, S.-k. (2018). Anomaly detection with generative adversarial networks for multivariate time series. arXiv preprint arXiv:1809.04758.
Li, D., Chen, D., Jin, B., Shi, L., Goh, J., e Ng, S.-K. (2019). Mad-gan: Multivariate anomaly detection for time series data with generative adversarial networks. In International conference on artificial neural networks, pages 703–716. Springer.
Li, X., Metsis, V., Wang, H., e Ngu, A. H. H. (2022). Tts-gan: A transformer-based time-series generative adversarial network. In International conference on artificial intelligence in medicine, pages 133–143. Springer.
Liao, N., Wang, J., Guan, J., e Fan, H. (2024). A multi-step attack identification and correlation method based on multi-information fusion. Computers and Electrical Engineering, 117:109249.
Lippmann, R. P., Fried, D. J., Graf, I., Haines, J. W., Kendall, K. R., McClung, D., Weber, D., Webster, S. E., Wyschogrod, D., Cunningham, R. K., et al. (2000). Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation. In Proceedings DARPA Information survivability conference and exposition. DISCEX’00, volume 2, pages 12–26. IEEE.
Moustafa, N. e Slay, J. (2015). UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), pages 1–6. IEEE.
Myneni, S., Chowdhary, A., Sabur, A., Sengupta, S., Agrawal, G., Huang, D., e Kang, M. (2020). DAPT 2020 - constructing a benchmark dataset for advanced persistent threats. In Wang, G., Ciptadi, A., e Ahmadzadeh, A., editors, Deployable Machine Learning for Security Defense, volume 1271, pages 138–163. Springer International Publishing. Series Title: Communications in Computer and Information Science.
Myneni, S., Jha, K., Sabur, A., Agrawal, G., Deng, Y., Chowdhary, A., e Huang, D. (2023). Unraveled—a semi-synthetic dataset for advanced persistent threats. Computer Networks, 227:109688.
Navarro, J., Deruyver, A., e Parrend, P. (2018). A systematic survey on multi-step attack detection. Computers & Security, 76:214–249.
Navidan, H., Moshiri, P. F., Nabati, M., Shahbazian, R., Ghorashi, S. A., Shah-Mansouri, V., e Windridge, D. (2021). Generative adversarial networks (gans) in networking: A comprehensive survey & evaluation. Computer Networks, 194:108149.
Sharafaldin, I., Habibi Lashkari, A., e Ghorbani, A. A. A detailed analysis of the CICIDS2017 data set. In Mori, P., Furnell, S., e Camp, O., editors, Information Systems Security and Privacy, volume 977, pages 172–188. Springer International Publishing. Series Title: Communications in Computer and Information Science.
Shiravi, A., Shiravi, H., Tavallaee, M., e Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security, 31(3):357–374.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, U., e Polosukhin, I. (2017). Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., Gao, M., Hou, H., e Wang, C. (2018). Machine learning and deep learning methods for cybersecurity. Ieee access, 6:35365–35381.
Xiong, C., Zhu, T., Dong, W., Ruan, L., Yang, R., Cheng, Y., Chen, Y., Cheng, S., e Chen, X. (2020). Conan: A practical real-time apt detection system with high accuracy and efficiency. IEEE Transactions on Dependable and Secure Computing, 19(1):551–565.
Yoon, J., Jarrett, D., e Van der Schaar, M. (2019). Time-series generative adversarial networks. Advances in neural information processing systems, 32.
Zeeshan, M. e Maasooma (2024). Trans-GAN: A deep learning paradigm for multi-type anomaly detection in network traffic. In 2024 International Conference on Frontiers of Information Technology (FIT), pages 1–6. IEEE.
Zhou, P., Zhou, G., Wu, D., e Fei, M. (2021). Detecting multi-stage attacks using sequence-to-sequence model. Computers & Security, 105:102203.
Zhu, F., Ye, F., Fu, Y., et al. (2019a). Electrocardiogram generation with a bidirectional lstm-cnn generative adversarial network. Scientific reports, 9(1):6734.
Zhu, G., Zhao, H., Liu, H., e Sun, H. (2019b). A novel lstm-gan algorithm for time series anomaly detection. In 2019 Prognostics and System Health Management Conference (PHM-Qingdao), pages 1–6.
Published
2025-09-01
How to Cite
COSSETIN NETO, Alfredo; PREGARDIER, Rafel C.; SANTOS, Carlos R. P. dos; FULBER-GARCIA, Vinicius; SILVA, Luis A. L..
Enhancing APT Detection with Synthetic Data Generation Based on GAN-Transformers. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 131-146.
DOI: https://doi.org/10.5753/sbseg.2025.11378.
