Auditoria de aplicações de Big Data usando Hashes de Similaridade e Blockchains

  • Carlos A. R. Oliveira INMETRO
  • Paulo Assumpção Universidade Federal do Rio de Janeiro
  • Pablo Ortiz INMETRO
  • Wilson Melo INMETRO
  • Luiz Carmo INMETRO

Resumo


Com a expansão das aplicações de BigData, garantir a segurança e confiabilidade dos dados armazenados tornou-se uma tarefa desafiadora. Tal desafio é especialmente preocupante quando se considera o monitoramento de infraestruturas críticas, em especial aquelas que contemplam ativos físicos monitorados por sensores e dispositivos IoT de coleta de dados. Uma alternativa consiste no uso de blockchains como mecanismo de auditoria de aplicações Big Data a partir da técnica off-chain, onde os pacotes de dados brutos são armazenados em um sistema de banco de dados convencional e apenas um resumo criptográfico dos dados é escrito no blockchain. Embora bastante aplicada na literatura recente sobre o tema, essa estratégia não permite a auditoria de dados em cenários de perda parcial de informações, onde pacotes de dados correspondentes a subconjuntos do pacote original precisam ser verificados. Este artigo propõe uma estratégia de auditoria de dados em aplicações Big Data que emprega hashes de similaridade para estender as funcionalidades do modelo off-chain. Usado em conjunto com resumos criptográficos e smart contracts, tais hashes permitem auditar pacotes de dados distintos em situações de possível perda parcial, possibilitando diferenciar uma situação não intencional de uma tentativa deliberada de fraude. Em nossos experimentos, avaliamos os algoritmos Minhash e Simhash, apontando resultados computacionais que indicam que o Minhash é bastante promissor para esse tipo de aplicação, e pode contribuir significativamente para a robustez de processos de auditoria em aplicações Big Data.

Palavras-chave: Big Data, auditoria, integridade, blockchain, hash de similaridade, LSH

Referências

Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference, pages 1–15, 2018.

Paulo Assumpcao, Carlos Oliveira, Wilson Melo, and Luiz Carmo. Sensors fingerprints using machine learning: a case study on dam monitoring systems. In 2022 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pages 1–6. IEEE, 5 2022.

Christian Cachin et al. Architecture of the hyperledger blockchain fabric. In Workshop on distributed cryptocurrencies and consensus ledgers, volume 310, page 4, 2016.

Jian Chen, Zhihan Lv, and Houbing Song. Design of personnel big data management system based on blockchain. Future Generation Computer Systems, 101:1122–1129, 2019.

Hong-Ning Dai, Hao Wang, Guangquan Xu, Jiafu Wan, and Muhammad Imran. Big data analytics for manufacturing internet of things: opportunities, challenges and enabling technologies. Enterprise Information Systems, 14(9-10):1279–1303, 2020.

Christian Esposito, Alfredo De Santis, Genny Tortora, Henry Chang, and Kim-Kwang Raymond Choo. Blockchain: A panacea for healthcare cloud-based data security and privacy? IEEE Cloud Computing, 5(1):31–37, 2018.

Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. In Vldb, pages 518–529, 1999.

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998.

Omid Jafari, Preeti Maurya, Parth Nagarkar, Khandker Mushfiqul Islam, and Chidambaram Crushev. A survey on locality sensitive hashing algorithms and their applications. arXiv preprint arXiv:2102.08942, 2021.

Chenxin Li, Peilun Li, Dong Zhou, Zhe Yang, Ming Wu, Guang Yang, Wei Xu, Fan Long, and Andrew Chi-Chih Yao. A decentralized blockchain with high throughput and fast confirmation. In 2020 USENIX Annual Technical Conference, pages 515–528, 2020.

Jiaxing Li, Jigang Wu, Guiyuan Jiang, and Thambipillai Srikanthan. Blockchain-based public auditing for big data in cloud storage. Information Processing & Management, 57(6):102382, 2020.

Wilson S Melo Jr, Lucas S Dos Santos, Lucila MS Bento, Paulo R Nascimento, Carlos AR Oliveira, and Ramon R Rezende. Using blockchains to protect critical infrastructures: a comparison between ethereum and hyperledger fabric. International Journal of Security and Networks, 17(2):77–91, 2022.

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review, page 21260, 2008.

Ana Carolina de Oliveira Neves, Flávia Peres Nunes, Felipe Alencar de Carvalho, and Geraldo Wilson Fernandes. Neglect of ecosystems services by mining, and the worst environmental disaster in brazil. Natureza & Conserva o, 1(14):24–27, 2016.

Daniel Peters, Artem Yurchenko, Wilson Melo, Katsuhiro Shirono, Takashi Usuda, Jean-Pierre Seifert, and Florian Thiel. It security for measuring instruments: confidential checking of software functionality. In Future of Information and Communication Conference, pages 701–720. Springer, 2020.

Rameshwar Pratap, Karthik Revanuru, Ravi Anirudh, and Raghav Kulkarni. Efficient compression algorithm for multimedia data. In 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), pages 245–250. IEEE, 2020.

Arun Raman and Fei Liu. An investigation of the brumadinho dam break with hec ras simulation. arXiv preprint arXiv:1911.05219, 2019.

Kexin Rong, Clara E Yoon, Karianne J Bergen, Hashem Elezabi, Peter Bailis, Philip Levis, and Gregory C Beroza. Locality-sensitive hashing for earthquake detection: A case study of scaling data-driven science. arXiv preprint arXiv:1803.09835, 2018.

Anshumali Shrivastava and Ping Li. In Defense of Minhash over Simhash. In Samuel Kaski and Jukka Corander, editors, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume 33 of Proceedings of Machine Learning Research, pages 886–894, Reykjavik, Iceland, 22–25 Apr 2014. PMLR.

Joao Sousa, Alysson Bessani, and Marko Vukolic. A byzantine fault-tolerant ordering service for the hyperledger fabric blockchain platform. In 2018 48th annual IEEE/IFIP international conference on dependable systems and networks (DSN), pages 51–58. IEEE, 2018.

Jin Sun, Xiaomin Yao, Shangping Wang, and Ying Wu. Blockchain-based secure storage and access scheme for electronic medical records in ipfs. IEEE Access, 8:59389–59401, 2020.

Baskut Tuncak. Lessons from the samarco disaster1. Business and Human Rights Journal, 2(1):157–162, 2017.

Taylor Rodriguez Vance and Andrew Vance. Cybersecurity in the blockchain era: a survey on examining critical infrastructure protection with blockchain-based technology. In 2019 IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), pages 107–112. IEEE, 2019.

Marko Vukoli´c. The quest for scalable blockchain fabric: Proof-of-work vs. bft replication. In International workshop on open problems in network security, pages 112–125. Springer, 2015.

Marko Vukoli´c. Rethinking permissioned blockchains. In Proceedings of the ACM Workshop on Blockchain, Cryptocurrencies and Contracts, pages 3–7, 2017.

Hao Wang and Yujiao Song. Secure cloud-based ehr system using attribute-based cryptosystem and blockchain. Journal of medical systems, 42(8):1–9, 2018.

Wei Wu, Bin Li, Ling Chen, Junbin Gao, and Chengqi Zhang. A review for weighted minhash algorithms. IEEE Transactions on Knowledge and Data Engineering, 34(6):2553–2573, 2020.

Yulei Wu, Hong-Ning Dai, and Hao Wang. Convergence of blockchain and edge computing for secure and scalable iiot critical infrastructures in industry 4.0. IEEE Internet of Things Journal, 8(4):2300–2317, 2020.

Yanwei Xu, Lianyong Qi, Wanchun Dou, and Jiguo Yu. Privacy-preserving and scalable service recommendation based on simhash in a distributed cloud environment. Complexity, 2017, 2017.

Jiachen Yang, Jiabao Wen, Bin Jiang, and Huihui Wang. Blockchainbased sharing and tamper-proof framework of big data networking. IEEE Network, 34(4):62–67, 2020.

Ma Zhaofeng, Wang Lingyun, Wang Xiaochang, Wang Zhen, and Zhao Weizhe. Blockchain-enabled decentralized trust management and secure usage control of iot big data. IEEE Internet of Things Journal, 7(5):4000–4015, 2019.

Qiheng Zhou, Huawei Huang, Zibin Zheng, and Jing Bian. Solutions to scalability of blockchain: A survey. Ieee Access, 8:16440–16455, 2020.
Publicado
21/11/2022
OLIVEIRA, Carlos A. R.; ASSUMPÇÃO, Paulo; ORTIZ, Pablo; MELO, Wilson; CARMO, Luiz. Auditoria de aplicações de Big Data usando Hashes de Similaridade e Blockchains. In: ARTIGOS COMPLETOS - SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SISTEMAS COMPUTACIONAIS (SBESC), 12. , 2022, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 32-39. ISSN 2763-9002. DOI: https://doi.org/10.5753/sbesc_estendido.2022.227265.