MH-1M: One of The Most Comprehensive and Up-to-Date Dataset for Advanced Android Malware Detection

  • Hendrio Bragança UFAM
  • Vanderson Rocha UFAM
  • Joner Assolin UFAM
  • Diego Kreutz UNIPAMPA
  • Eduardo Feitosa UFAM

Resumo


We introduce MH-1M, one of the most comprehensive and up-to-date dataset for advanced Android malware research. This dataset includes 1,340,515 applications, covering diverse features and extensive sets of metadata. For precise malware assessment, we utilize the VirusTotal API, integrating multiple detection methods to ensure reliable outcomes. Our GitHub repository offers users access to the processed dataset and associated metadata, totaling over 400GB. This includes comprehensive outputs from the feature extraction process and VirusTotal metadata files. Our findings underscore the important role of the MH-1M dataset as an invaluable resource for understanding the evolving landscape of malware.

Referências

Aboaoja, F. A., Zainal, A., Ghaleb, F. A., Al-rimy, B. A. S., Eisa, T. A. E., and Elnour, A. A. H. (2022). Malware detection issues, challenges, and future directions: A survey. Applied Sciences, 12(17):8482.

AI & Data Today (2023). Top 10 reasons why ai projects fail. [link].

Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014). Drebin: Effective and explainable detection of android malware in your pocket. In NDSS, volume 14.

Botacin, M., Ceschin, F., Sun, R., Oliveira, D., and Grégio, A. (2021). Challenges and pitfalls in malware research. Computers & Security, 106:102287.

Bragança, H., Rocha, V., Barcellos, L. V., Souto, E., Kreutz, D., and Feitosa, E. (2023). Capturing the Behavior of Android Malware with MH-100K: A Novel and Multidimensional Dataset. In Anais do XXIII SBSeg, pages 510–515. SBC.

Bragança, H., Rocha, V., Souto, E., Kreutz, D., and Feitosa, E. (2023). Explaining the Effectiveness of Machine Learning in Malware Detection: Insights from Explainable AI. In Anais do XXIII SBSeg, Porto Alegre, RS, Brasil. SBC.

Bragança, H. et. al. (2024). MH-1M. [link].

Kumar, A. and Sharma, I. (2023). Understanding the behaviour of android ransomware attacks with real smartphones dataset. In ICONAT, pages 1–5. IEEE.

Miranda, T. C., Gimenez, P.-F., Lalande, J.-F., Tong, V. V. T., and Wilke, P. (2022). Debiasing android malware datasets: How can i trust your results if your dataset is biased? IEEE Transactions on Information Forensics and Security, 17:2182–2197.

Rocha, V., Assolin, J., Bragança, H., Kreutz, D., and Feitosa, E. (2023). AMGenerator e AM-Explorer: Geração de Metadados e Construção de Datasets Android. In Anais Estendidos do XXIII SBSeg, pages 41–48, Porto Alegre, RS, Brasil. SBC.

Scalas, M. et al. (2021). Malware analysis and detection with explainable machine learning. UNICA Institutional Research Information System.

Schmelzer, R. (2022). The one practice that is separating the AI successes from the failures. Forbes. [link].

Shwartz-Ziv, R. and Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90.

Taheri, L., Abdulkadir, A. F., and Lashkari, A. H. (2019). Investigation of the android malware (cic-invesandmal2019). [link].

Yerima, S. (2018). Android malware dataset for machine learning 2. [link].

Zakeya, N., Ségla, K., Chamseddine, T., and Alvine, B. B. (2022). Probing androvul dataset for studies on android malware classification. Journal of King Saud University-Computer and Information Sciences, 34(9):6883–6894.
Publicado
16/09/2024
BRAGANÇA, Hendrio; ROCHA, Vanderson; ASSOLIN, Joner; KREUTZ, Diego; FEITOSA, Eduardo. MH-1M: One of The Most Comprehensive and Up-to-Date Dataset for Advanced Android Malware Detection. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 24. , 2024, São José dos Campos/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 843-849. DOI: https://doi.org/10.5753/sbseg.2024.241632.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 3 4 5 6 7 > >>