Paralelização do Algoritmo de Indexação de Dados Multimídia Baseado em Quantização

André Fernandes; George Teodoro

doi:10.5753/wscad_estendido.2019.8699

André Fernandes Universidade de Brasília
George Teodoro Universidade Federal de Minas Gerais

DOI: https://doi.org/10.5753/wscad_estendido.2019.8699

Resumo

Nesse artigo é apresentada uma paralelização eficiente do algoritmo de busca por similaridade Product Quantization Approximate Nearest Neighbor Search (PQANNS). Esse método pode responder consultas com uma demanda reduzida de memória e, juntamente com a paralelização proposta, pode lidar de forma eficiente com grandes bases de dados. A execução utilizando 128 nós/3584 núcleos de CPU foi capaz de atingir uma eficiência do paralelismo de 0.97 em uma base de dados contendo 256 bilhões de descritores SIFT.

Referências

Andrade, G., Fernandes, A., Gomes, J. M., Ferreira, R., and Teodoro, G. (2019). Large-scale parallel similarity search with product quantization for online multimedia services. J. Parallel Distrib. Comput., 125:81–92.

Bahmani, B., Goel, A., and Shinde, R. (2012). Efficient distributed locality sensitive hashing. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2174–2178. ACM.

Beygelzimer, A., Kakade, S., and Langford, J. (2006). Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning, pages 97–104. ACM.

Böhm, C., Berchtold, S., and Keim, D. A. (2001). Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR), 33(3):322–373.

Douze, M. and Jégou, H. (2014). The yael library. In Proceedings of the 22nd ACM international conference on Multimedia, pages 687–690. ACM.

Friedman, J. H., Bentley, J. L., and Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3):209–226.

Gionis, A., Indyk, P., Motwani, R., et al. (1999). Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518–529.

Jain, M. (2014). Enhanced image and video representation for visual recognition. PhD thesis, Université Rennes 1.

Jegou, H., Douze, M., and Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128.

Muja, M. and Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2(331340):2.

Silpa-Anan, C. and Hartley, R. (2008). Optimised kd-trees for fast image descriptor matching. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE.

Stupar, A., Michel, S., and Schenkel, R. (2010). Rankreduce-processing k-nearest neighbor queries on top of mapreduce. Large-Scale Distributed Systems for Information Retrieval, 15.

Valle, E., Cord, M., and Philipp-Foliguet, S. (2008). High-dimensional descriptor indexing for large multimedia databases. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 739–748. ACM.