Identificação Automática de Buckets de Armazenamento Vulneráveis com GENBUCKET
Resumo
Este artigo apresenta GENBUCKET, uma ferramenta modular para geração e validação de nomes de buckets em nuvem com modelos generativos modernos. GENBUCKET suporta LSTM, Transformer e GPT, treinados com conjuntos de dados personalizáveis para capturar diferentes padrões de nomeação. A ferramenta gera automaticamente nomes candidatos, verifica suas existências via DNS, classifica-os via HTTP e analisa buckets públicos em busca de vulnerabilidades. Com dados validados, GENBUCKET alcançou até 21,73% de acerto, que é mais de dez vezes o melhor resultado conhecido, e identificou dezenas de buckets com vulnerabilidades. Ao integrar geração, validação e análise, GENBUCKET contribui para a detecção automatizada de falhas de configuração em serviços de nuvem.Referências
AWS (2025). Amazon S3 - Armazenamento de Objetos Construído para Armazenar e Recuperar Qualquer Volume de Dados. [link].
Baras, S., Saeed, I., and Hajjdiab, H. (2019). Security and Privacy of AWS S3 and Azure Blob Storage Services. In Proc. of 2019 IEEE ICCCS.
Bazé, M., Fabris, J., de Paula, F. S., da Silva, C. A., and Ferreira, R. A. (2025). GenBucket: Source-Code Repository. [link].
Ben-Sasson, H. and Greenberg, R. (2023). 38TB of data accidentally exposed by Microsoft AI researchers. [link].
Bouchet, M. et al. (2020). Block Public Access: Trust Safety Verification of Access Control Policies. In Proc. of ACM FSE, pages p.281–291.
BR-Office (2016). Verificador Ortografico pt-br. [link].
Cable, J., Gregory, D., Izhikevich, L., and Durumeric, Z. (2021). Stratosphere: Finding Vulnerable Cloud Storage Buckets. In Proc. of RAID 2021, page 399–411.
Cisoadvisor (2023). Securitas Expõe Três Terabytes de Dados de Aeroportos na Colômbia e Peru. [link].
Continella, A. et al. (2018). There’s a Hole in that Bucket! A Large-scale Analysis of Misconfigured S3 Buckets. In Proc. of ACSAC, pages p.702–711.
Donda, D. (2018). [link].
Eldad, D. (2023). The Danger of Publicly Exposed S3 Buckets. [link].
EleutherAI (2023). GPT-Neo. [link].
Google (2025). Google Cloud, Produtos de Armazenamento Online do Google Cloud. [link].
Grayhatwarfare (2018). Search Public Buckets. [link].
Houdt, G. V. et al. (2020). A Review on the Long Short-Term Memory Model. In Artificial Intelligence Review, volume 53, page 5929–5955. Artif Intell Rev 53.
Kelley, P. et al. (2012). Guess Again (and Again and Again): Measuring Password Strength by Simulating Password-Cracking Algorithms. In Proc. of IEEE Security and Privacy 2012.
Mari, A. (2020). Brazilian firm exposes personal details of thousands of soccer fans. [link].
Microsoft (2023). Recomendações de Segurança para o Armazenamento de Blobs. [link].
Mushtaq, F. (2025). [link] data-breach. [link].
NordSecurity (2023). Passwords List/. [link].
Ocean, D. (2025). Highly Scalable and Affordable Object Storage. [link].
Paszke, A. et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Proc of NeurIPS 2019, volume 32.
ProjectDiscovery (2024). Nuclei - fast and customizable vulnerability scanner based on templates. [link].
Qualys, Inc. (2025). Qualys Web Application Scanning (WAS). [link].
Research, G. V. (2025). Global Cloud Computing Market Size & Outlook, 2024-2030. [link].
Surfshark (2022). Brasil é o 6º País com mais Vazamentos de Dados no Planeta, Aponta Levantamento. [link].
Surribas, N. (2006). Wapiti: a Free and Open-Source web-application vulnerability scanner in Python. [link].
Tenable (2024). 2024 Cloud Security Outlook: Navigating Barriers and Setting Priorities. [link].
Tenable, Inc. (2024). Nessus professional vulnerability scanner. [link].
Vaswani, A. et al. (2017). Attention is all you need. In Proc. of NIPS 2017.
VirusTotal (2022). VirusTotal Intelligence. [link].
Weaver, K. (2017). This is a demo of setting up an Amazon Web Service (AWS) S3 bucket and uploading a file with Python. /. [link].
Weir, M. et al. (2009). Password Cracking Using Probabilistic Context-Free Grammars. In Proc. of IEEE Security and Privacy 2009.
Westervelt, R. (2013). Amazon S3 Users Exposing Sensitive Data, Study Finds. [link].
Willis (2013). There’s a Hole in 1.951 Amazon S3 Buckets. [link].
Wolf, T. et al. (2020). Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 EMNLP, pages 38–45.
Wood, R. (2011). Analysing Amazon’s Buckets. [link].
Yadmani, S. E. et al. (2025). The File That Contained the Keys Has Been Removed: An Empirical Analysis of Secret Leaks in Cloud Buckets and Responsible Disclosure Outcomes. In Proc. of IEEE Security and Privacy 2025, page 9–9.
Baras, S., Saeed, I., and Hajjdiab, H. (2019). Security and Privacy of AWS S3 and Azure Blob Storage Services. In Proc. of 2019 IEEE ICCCS.
Bazé, M., Fabris, J., de Paula, F. S., da Silva, C. A., and Ferreira, R. A. (2025). GenBucket: Source-Code Repository. [link].
Ben-Sasson, H. and Greenberg, R. (2023). 38TB of data accidentally exposed by Microsoft AI researchers. [link].
Bouchet, M. et al. (2020). Block Public Access: Trust Safety Verification of Access Control Policies. In Proc. of ACM FSE, pages p.281–291.
BR-Office (2016). Verificador Ortografico pt-br. [link].
Cable, J., Gregory, D., Izhikevich, L., and Durumeric, Z. (2021). Stratosphere: Finding Vulnerable Cloud Storage Buckets. In Proc. of RAID 2021, page 399–411.
Cisoadvisor (2023). Securitas Expõe Três Terabytes de Dados de Aeroportos na Colômbia e Peru. [link].
Continella, A. et al. (2018). There’s a Hole in that Bucket! A Large-scale Analysis of Misconfigured S3 Buckets. In Proc. of ACSAC, pages p.702–711.
Donda, D. (2018). [link].
Eldad, D. (2023). The Danger of Publicly Exposed S3 Buckets. [link].
EleutherAI (2023). GPT-Neo. [link].
Google (2025). Google Cloud, Produtos de Armazenamento Online do Google Cloud. [link].
Grayhatwarfare (2018). Search Public Buckets. [link].
Houdt, G. V. et al. (2020). A Review on the Long Short-Term Memory Model. In Artificial Intelligence Review, volume 53, page 5929–5955. Artif Intell Rev 53.
Kelley, P. et al. (2012). Guess Again (and Again and Again): Measuring Password Strength by Simulating Password-Cracking Algorithms. In Proc. of IEEE Security and Privacy 2012.
Mari, A. (2020). Brazilian firm exposes personal details of thousands of soccer fans. [link].
Microsoft (2023). Recomendações de Segurança para o Armazenamento de Blobs. [link].
Mushtaq, F. (2025). [link] data-breach. [link].
NordSecurity (2023). Passwords List/. [link].
Ocean, D. (2025). Highly Scalable and Affordable Object Storage. [link].
Paszke, A. et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Proc of NeurIPS 2019, volume 32.
ProjectDiscovery (2024). Nuclei - fast and customizable vulnerability scanner based on templates. [link].
Qualys, Inc. (2025). Qualys Web Application Scanning (WAS). [link].
Research, G. V. (2025). Global Cloud Computing Market Size & Outlook, 2024-2030. [link].
Surfshark (2022). Brasil é o 6º País com mais Vazamentos de Dados no Planeta, Aponta Levantamento. [link].
Surribas, N. (2006). Wapiti: a Free and Open-Source web-application vulnerability scanner in Python. [link].
Tenable (2024). 2024 Cloud Security Outlook: Navigating Barriers and Setting Priorities. [link].
Tenable, Inc. (2024). Nessus professional vulnerability scanner. [link].
Vaswani, A. et al. (2017). Attention is all you need. In Proc. of NIPS 2017.
VirusTotal (2022). VirusTotal Intelligence. [link].
Weaver, K. (2017). This is a demo of setting up an Amazon Web Service (AWS) S3 bucket and uploading a file with Python. /. [link].
Weir, M. et al. (2009). Password Cracking Using Probabilistic Context-Free Grammars. In Proc. of IEEE Security and Privacy 2009.
Westervelt, R. (2013). Amazon S3 Users Exposing Sensitive Data, Study Finds. [link].
Willis (2013). There’s a Hole in 1.951 Amazon S3 Buckets. [link].
Wolf, T. et al. (2020). Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 EMNLP, pages 38–45.
Wood, R. (2011). Analysing Amazon’s Buckets. [link].
Yadmani, S. E. et al. (2025). The File That Contained the Keys Has Been Removed: An Empirical Analysis of Secret Leaks in Cloud Buckets and Responsible Disclosure Outcomes. In Proc. of IEEE Security and Privacy 2025, page 9–9.
Publicado
01/09/2025
Como Citar
BAZÉ, Milton; FABRIS, José; PAULA, Fabrício S. de; SILVA, Carlos Alberto da; FERREIRA, Ronaldo A..
Identificação Automática de Buckets de Armazenamento Vulneráveis com GENBUCKET. In: SIMPÓSIO BRASILEIRO DE CIBERSEGURANÇA (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 465-481.
DOI: https://doi.org/10.5753/sbseg.2025.11406.
