Automatic Identification of Vulnerable Storage Buckets with GENBUCKET
Abstract
This paper presents GENBUCKET, a modular tool for generating and validating cloud storage bucket names using modern generative models. GENBUCKET supports LSTM, Transformer, and GPT models trained on customizable datasets to capture diverse naming patterns. It automatically generates candidate names, verifies their existence via DNS, classifies them via HTTP, and analyzes public buckets for security vulnerabilities. Using validated data, GENBUCKET achieved hit rates of up to 21.73%, which is more than ten times the previous best-known result, and uncovered tens of buckets with vulnerabilities. By integrating generation, validation, and analysis, GENBUCKET enables automated detection of misconfigurations in cloud storage services.References
AWS (2025). Amazon S3 - Armazenamento de Objetos Construído para Armazenar e Recuperar Qualquer Volume de Dados. [link].
Baras, S., Saeed, I., and Hajjdiab, H. (2019). Security and Privacy of AWS S3 and Azure Blob Storage Services. In Proc. of 2019 IEEE ICCCS.
Bazé, M., Fabris, J., de Paula, F. S., da Silva, C. A., and Ferreira, R. A. (2025). GenBucket: Source-Code Repository. [link].
Ben-Sasson, H. and Greenberg, R. (2023). 38TB of data accidentally exposed by Microsoft AI researchers. [link].
Bouchet, M. et al. (2020). Block Public Access: Trust Safety Verification of Access Control Policies. In Proc. of ACM FSE, pages p.281–291.
BR-Office (2016). Verificador Ortografico pt-br. [link].
Cable, J., Gregory, D., Izhikevich, L., and Durumeric, Z. (2021). Stratosphere: Finding Vulnerable Cloud Storage Buckets. In Proc. of RAID 2021, page 399–411.
Cisoadvisor (2023). Securitas Expõe Três Terabytes de Dados de Aeroportos na Colômbia e Peru. [link].
Continella, A. et al. (2018). There’s a Hole in that Bucket! A Large-scale Analysis of Misconfigured S3 Buckets. In Proc. of ACSAC, pages p.702–711.
Donda, D. (2018). [link].
Eldad, D. (2023). The Danger of Publicly Exposed S3 Buckets. [link].
EleutherAI (2023). GPT-Neo. [link].
Google (2025). Google Cloud, Produtos de Armazenamento Online do Google Cloud. [link].
Grayhatwarfare (2018). Search Public Buckets. [link].
Houdt, G. V. et al. (2020). A Review on the Long Short-Term Memory Model. In Artificial Intelligence Review, volume 53, page 5929–5955. Artif Intell Rev 53.
Kelley, P. et al. (2012). Guess Again (and Again and Again): Measuring Password Strength by Simulating Password-Cracking Algorithms. In Proc. of IEEE Security and Privacy 2012.
Mari, A. (2020). Brazilian firm exposes personal details of thousands of soccer fans. [link].
Microsoft (2023). Recomendações de Segurança para o Armazenamento de Blobs. [link].
Mushtaq, F. (2025). [link] data-breach. [link].
NordSecurity (2023). Passwords List/. [link].
Ocean, D. (2025). Highly Scalable and Affordable Object Storage. [link].
Paszke, A. et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Proc of NeurIPS 2019, volume 32.
ProjectDiscovery (2024). Nuclei - fast and customizable vulnerability scanner based on templates. [link].
Qualys, Inc. (2025). Qualys Web Application Scanning (WAS). [link].
Research, G. V. (2025). Global Cloud Computing Market Size & Outlook, 2024-2030. [link].
Surfshark (2022). Brasil é o 6º País com mais Vazamentos de Dados no Planeta, Aponta Levantamento. [link].
Surribas, N. (2006). Wapiti: a Free and Open-Source web-application vulnerability scanner in Python. [link].
Tenable (2024). 2024 Cloud Security Outlook: Navigating Barriers and Setting Priorities. [link].
Tenable, Inc. (2024). Nessus professional vulnerability scanner. [link].
Vaswani, A. et al. (2017). Attention is all you need. In Proc. of NIPS 2017.
VirusTotal (2022). VirusTotal Intelligence. [link].
Weaver, K. (2017). This is a demo of setting up an Amazon Web Service (AWS) S3 bucket and uploading a file with Python. /. [link].
Weir, M. et al. (2009). Password Cracking Using Probabilistic Context-Free Grammars. In Proc. of IEEE Security and Privacy 2009.
Westervelt, R. (2013). Amazon S3 Users Exposing Sensitive Data, Study Finds. [link].
Willis (2013). There’s a Hole in 1.951 Amazon S3 Buckets. [link].
Wolf, T. et al. (2020). Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 EMNLP, pages 38–45.
Wood, R. (2011). Analysing Amazon’s Buckets. [link].
Yadmani, S. E. et al. (2025). The File That Contained the Keys Has Been Removed: An Empirical Analysis of Secret Leaks in Cloud Buckets and Responsible Disclosure Outcomes. In Proc. of IEEE Security and Privacy 2025, page 9–9.
Baras, S., Saeed, I., and Hajjdiab, H. (2019). Security and Privacy of AWS S3 and Azure Blob Storage Services. In Proc. of 2019 IEEE ICCCS.
Bazé, M., Fabris, J., de Paula, F. S., da Silva, C. A., and Ferreira, R. A. (2025). GenBucket: Source-Code Repository. [link].
Ben-Sasson, H. and Greenberg, R. (2023). 38TB of data accidentally exposed by Microsoft AI researchers. [link].
Bouchet, M. et al. (2020). Block Public Access: Trust Safety Verification of Access Control Policies. In Proc. of ACM FSE, pages p.281–291.
BR-Office (2016). Verificador Ortografico pt-br. [link].
Cable, J., Gregory, D., Izhikevich, L., and Durumeric, Z. (2021). Stratosphere: Finding Vulnerable Cloud Storage Buckets. In Proc. of RAID 2021, page 399–411.
Cisoadvisor (2023). Securitas Expõe Três Terabytes de Dados de Aeroportos na Colômbia e Peru. [link].
Continella, A. et al. (2018). There’s a Hole in that Bucket! A Large-scale Analysis of Misconfigured S3 Buckets. In Proc. of ACSAC, pages p.702–711.
Donda, D. (2018). [link].
Eldad, D. (2023). The Danger of Publicly Exposed S3 Buckets. [link].
EleutherAI (2023). GPT-Neo. [link].
Google (2025). Google Cloud, Produtos de Armazenamento Online do Google Cloud. [link].
Grayhatwarfare (2018). Search Public Buckets. [link].
Houdt, G. V. et al. (2020). A Review on the Long Short-Term Memory Model. In Artificial Intelligence Review, volume 53, page 5929–5955. Artif Intell Rev 53.
Kelley, P. et al. (2012). Guess Again (and Again and Again): Measuring Password Strength by Simulating Password-Cracking Algorithms. In Proc. of IEEE Security and Privacy 2012.
Mari, A. (2020). Brazilian firm exposes personal details of thousands of soccer fans. [link].
Microsoft (2023). Recomendações de Segurança para o Armazenamento de Blobs. [link].
Mushtaq, F. (2025). [link] data-breach. [link].
NordSecurity (2023). Passwords List/. [link].
Ocean, D. (2025). Highly Scalable and Affordable Object Storage. [link].
Paszke, A. et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Proc of NeurIPS 2019, volume 32.
ProjectDiscovery (2024). Nuclei - fast and customizable vulnerability scanner based on templates. [link].
Qualys, Inc. (2025). Qualys Web Application Scanning (WAS). [link].
Research, G. V. (2025). Global Cloud Computing Market Size & Outlook, 2024-2030. [link].
Surfshark (2022). Brasil é o 6º País com mais Vazamentos de Dados no Planeta, Aponta Levantamento. [link].
Surribas, N. (2006). Wapiti: a Free and Open-Source web-application vulnerability scanner in Python. [link].
Tenable (2024). 2024 Cloud Security Outlook: Navigating Barriers and Setting Priorities. [link].
Tenable, Inc. (2024). Nessus professional vulnerability scanner. [link].
Vaswani, A. et al. (2017). Attention is all you need. In Proc. of NIPS 2017.
VirusTotal (2022). VirusTotal Intelligence. [link].
Weaver, K. (2017). This is a demo of setting up an Amazon Web Service (AWS) S3 bucket and uploading a file with Python. /. [link].
Weir, M. et al. (2009). Password Cracking Using Probabilistic Context-Free Grammars. In Proc. of IEEE Security and Privacy 2009.
Westervelt, R. (2013). Amazon S3 Users Exposing Sensitive Data, Study Finds. [link].
Willis (2013). There’s a Hole in 1.951 Amazon S3 Buckets. [link].
Wolf, T. et al. (2020). Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 EMNLP, pages 38–45.
Wood, R. (2011). Analysing Amazon’s Buckets. [link].
Yadmani, S. E. et al. (2025). The File That Contained the Keys Has Been Removed: An Empirical Analysis of Secret Leaks in Cloud Buckets and Responsible Disclosure Outcomes. In Proc. of IEEE Security and Privacy 2025, page 9–9.
Published
2025-09-01
How to Cite
BAZÉ, Milton; FABRIS, José; PAULA, Fabrício S. de; SILVA, Carlos Alberto da; FERREIRA, Ronaldo A..
Automatic Identification of Vulnerable Storage Buckets with GENBUCKET. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 465-481.
DOI: https://doi.org/10.5753/sbseg.2025.11406.
