Processamento Eficiente de Consultas Analíticas Estendidas com Predicado de Similaridade em Spark
Resumo
Um data warehousing de imagens estende um data warehousing convencional para também manipular imagens representadas por vetores de características e atributos para pesquisa por similaridade. Um desafio que surge é o processamento de consultas analíticas estendidas com predicado de similaridade, desde que essas consultas possuem alto custo computacional. Neste artigo, é proposto o método BrOmnImg, o qual soluciona eficientemente esse desafio usando o framework Spark. Comparado com o método mais próximo, BrOmnImg proveu ganhos de desempenho de até 65,49%.
Referências
Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Commununications of the ACM, 51(1):107–113. DOI: https://doi.org/10.1145/1327452.1327492
Li, D., Zhang, W., Shen, S., and Zhang, Y. (2017). SES-LSH: Shuffle-efficient locality sensitive hashing for distributed similarity search. In ICWS 2017, pages 822–827. DOI: https://doi.org/10.1109/ICWS.2017.99
Nguyen, T. D. T. and Huh, E.-N. (2017). An efficient similar image search framework for large-scale data on cloud. In IMCOM 2017, pages 65:1–65:8. DOI: https://doi.org/10.1145/3022227.3022291
Nguyen, V.-Q., Ngoc, N., and Kim, K. (2017). Design of a platform for collecting and analyzing agricultural big data. Journal of Digital Contents Society, 18:149–158. DOI: https://doi.org/10.9728/dcs.2017.18.1.149
Rocha, G. M. and Ciferri, C. D. A. (2018). ImgDW generator: a tool for generating data for medical image data warehouses. In SBBD 2018 Proc. Companion, pages 23–28.
Sebaa, A., Chikh, F., Nouicer, A., and Tari, A. (2018). Medical big data warehouse: Architecture and system design, a case study: Improving healthcare resources distribution. Journal of Medical Systems, 42(4):59. DOI: https://doi.org/10.1007/s10916-018-0894-9
Teixeira, J. W., Annibal, L. P., Felipe, J. C., Ciferri, R. R., and Ciferri, C. D. A. (2015). A similarity-based data warehousing environment for medical images. Computers in Biology and Medicine, 66:190 – 208. DOI: https://doi.org/10.1016/j.compbiomed.2015.08.019
Traina, C., Filho, R. F. S., Traina, A. J. M., Vieira, M. R., and Faloutsos, C. (2007). The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483–505. DOI: https://doi.org/10.1007/s00778-005-0178-0
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark: Cluster computing with working sets. In USENIX HotCloud 2010.