DLPS baseado em Deep Learning: Nova Abordagem para Detecção de Exfiltração em HDFS

  • James de Castro Martins Universidade de Brasília (UnB)
  • Li Weigang Universidade de Brasília (UnB)
  • Luís Paulo Faina Garcia Universidade de Brasília (UnB)
  • Gabriel Alves Castro Universidade de Brasília (UnB)

Resumo


Este artigo descreve segurança cibernética aplicada a Mídias Sociais com ênfase no uso de HDFS para armazenamento e processamento de grandes volumes de dados. O objetivo foi desenvolver um framework de DLPS baseado em ML que melhore a precisão na identificação de vazamento de dados em estruturas de HDFS. Assim, identificou-se as principais categorias de abordagens em segurança cibernética, no âmbito de HDFS, em comparação com Framework MITRE ATT&CK. Lacunas de pesquisas foram identificadas, em trabalhos realizados envolvendo DLPS e Machine Learning, oferecendo a necessidade do desenvolvimento de soluções correlacionadas. Um framework baseado em Deep Learning aplicado aos metadados e logs do Hadoop é proposto como solução para melhorar a detecção de exfiltração.

Palavras-chave: HDFS, Security, Deep Learning, Exfiltração, DLPS

Referências

Al-Shaer, R., Spring, J. M., and Christou, E. (2020). Learning the associations of mitre att & ck adversarial techniques. In 2020 IEEE Conference on Communications and Network Security (CNS), pages 1–9. IEEE.

Alneyadi, S., Sithirasenan, E., and Muthukkumarasamy, V. (2016). A survey on data leakage prevention systems. Journal of Network and Computer Applications, 62:137– 152.

Altay, B., Dokeroglu, T., and Cosar, A. (2019). Context-sensitive and keyword densitybased supervised machine learning techniques for malicious webpage detection. Soft Computing, 23(12):4177–4191.

ApacheFoundation (2019a). The apache software foundation. Disponível em: <https://ranger.apache.org/>. Acesso em: 14 abr. 2021.

ApacheFoundation (2019b). Hdfs architecture guide. Disponível em: [link]. Acesso em: 14 abr. 2021.

Bhathal, G. S. and Singh, A. (2019). Big data: Hadoop framework vulnerabilities, security issues and attacks. Array, 1:100002.

Choudhary, M., Yadav, A. S., Yadav, D. K., and Pawar, V. (2017). A review on hadoop security issues.

Chu, K. (2020). Apache hadoop: A review on security issues and solutions for hdfs: A deep dive into the security issues occur in hdfs structure, and the available technologies to protect it. Disponível em: [link]. Acesso em: 13 abr. 2021.

Cloudera (2020). Authentication. Disponível em : <https://docs.cloudera.com/documentation/enterprise/latest/topics/>. Acesso em: 14 abr. 2021.

Coulouris, G. F., Dollimore, J., and Kindberg, T. (2005). Distributed systems: concepts and design. pearson education.

Fang, Y., Zhang, C., Huang, C., Liu, L., and Yang, Y. (2019). Phishing email detection using IEEE Access, improved rcnn model with multilevel vectors and attention mechanism. 7:56329–56340.

Fu, X., Gao, Y., Luo, B., Du, X., and Guizani, M. (2017). Security threats to hadoop: data leakage attacks and investigation. IEEE Network, 31(2):67–71.

Huang, Y., Yang, Q., Qin, J., and Wen, W. (2019). Phishing url detection via cnn and attention-based hierarchical rnn. In 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pages 112–119. IEEE.

Hudson, M. (2020). What is social media? <https://www.thebalancesmb.com/what-is-social-media-2890301>. Acesso em : 12 abr. 2021.

Disponível em: em: 12 Kar, D., Panigrahi, S., and Sundararajan, S. (2016). Sqligot: Detecting sql injection attacks using graph of tokens and svm. Computers & Security, 60:206–225.

Li, Y., Shi, S., Wu, Y., and Chen, Y. (2021). A review of enterprise social media: visualization of landscape and evolution. Internet Research.

Lin, H.-Y., Shen, S.-T., Tzeng, W.-G., and Lin, B.-S. P. (2012). Toward data condentiality via integrating hybrid encryption schemes and hadoop distributed le system. In 2012 IEEE 26th International Conference on Advanced Information Networking and Applications, pages 740–747. IEEE.

Liu, L., Chen, C., Zhang, J., De Vel, O., and Xiang, Y. (2019). Unsupervised insider detection through neural feature learning and model optimisation. In International Conference on Network and System Security, pages 18–36. Springer.

MITRE (2021). Enterprise matrix. Disponível em: <https://attack.mitre.org/matrices/enterprise/>. Acesso em: 14 abr. 2021.

Mokbal, F. M. M., Dan, W., Imran, A., Jiuchuan, L., Akhtar, F., and Xiaoxi, W. (2019). Mlpxss: an integrated xss-based attack detection scheme in web applications using multilayer perceptron technique. IEEE Access, 7:100567–100580.

Nayak, S. K. and Ojha, A. C. (2020). Data leakage detection and prevention: Review and research directions. Machine Learning and Information Processing, pages 203–212.

Sabir, B., Ullah, F., Babar, M. A., and Gaire, R. (2020). Machine learning for detecting data exltration. arXiv preprint arXiv:2012.09344.

Sadasivam, G. S., Kumari, K. A., and Rubika, S. (2012). A novel authentication service for hadoop in cloud environment. In 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pages 1–6. IEEE.

Saraladevi, B., Pazhaniraja, N., Paul, P. V., Basha, M. S., and Dhavachelvan, P. (2015). Big data and hadoop-a study in security perspective. Procedia computer science, 50:596–601.

Sharma, P. P. and Navdeti, C. P. (2014). Securing big data hadoop: a review of security issues, threats and solution. Int. J. Comput. Sci. Inf. Technol, 5(2):2126–2131.

Shrestha, P. L., Hempel, M., Rezaei, F., and Sharif, H. (2015). A support vector machinebased framework for detection of covert timing channels. IEEE Transactions on Dependable and Secure Computing, 13(2):274–283.

Singh, M., Mehtre, B. M., and Sangeetha, S. (2019). User behavior proling using ensemble In 2019 IEEE 5th International Conference on approach for insider threat detection. Identity, Security, and Behavior Analysis (ISBA), pages 1–8. IEEE.

Suganya, S. and Selvamuthukumaran, S. (2018). Hadoop distributed le system security-a review. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pages 1–5. IEEE.

Thuraisingham, B., Khadilkar, V., Gupta, A., Kantarcioglu, M., and Khan, L. (2010). Secure data storage and retrieval in the cloud. In 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2010), pages 1–8. IEEE.

Tondon, D. and Khurana, M. (2017). Security of big data in hadoop using aes-mr with auditing. In International Journal of Advanced Research in Computer Science and Software Engineering.

Zhang, H., Zhao, B., Yuan, H., Zhao, J., Yan, X., and Li, F. (2019). Sql injection detection In Proceedings of the 3rd International Conference on based on deep belief network. Computer Science and Application Engineering, pages 1–6.
Publicado
18/07/2021
MARTINS, James de Castro; WEIGANG, Li; GARCIA, Luís Paulo Faina; CASTRO, Gabriel Alves. DLPS baseado em Deep Learning: Nova Abordagem para Detecção de Exfiltração em HDFS. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 10. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 229-240. ISSN 2595-6094. DOI: https://doi.org/10.5753/brasnam.2021.16144.