Paralelização do Algoritmo de Indexação de Dados Multimídia Baseado em Quantização
Resumo
Nesse artigo é apresentada uma paralelização eficiente do algoritmo de busca por similaridade Product Quantization Approximate Nearest Neighbor Search (PQANNS). Esse método pode responder consultas com uma demanda reduzida de memória e, juntamente com a paralelização proposta, pode lidar de forma eficiente com grandes bases de dados. A execução utilizando 128 nós/3584 núcleos de CPU foi capaz de atingir uma eficiência do paralelismo de 0.97 em uma base de dados contendo 256 bilhões de descritores SIFT.
Referências
Bahmani, B., Goel, A., and Shinde, R. (2012). Efficient distributed locality sensitive hashing. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2174–2178. ACM.
Beygelzimer, A., Kakade, S., and Langford, J. (2006). Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning, pages 97–104. ACM.
Böhm, C., Berchtold, S., and Keim, D. A. (2001). Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR), 33(3):322–373.
Douze, M. and Jégou, H. (2014). The yael library. In Proceedings of the 22nd ACM international conference on Multimedia, pages 687–690. ACM.
Friedman, J. H., Bentley, J. L., and Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3):209–226.
Gionis, A., Indyk, P., Motwani, R., et al. (1999). Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518–529.
Jain, M. (2014). Enhanced image and video representation for visual recognition. PhD thesis, Université Rennes 1.
Jegou, H., Douze, M., and Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128.
Muja, M. and Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2(331340):2.
Silpa-Anan, C. and Hartley, R. (2008). Optimised kd-trees for fast image descriptor matching. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE.
Stupar, A., Michel, S., and Schenkel, R. (2010). Rankreduce-processing k-nearest neighbor queries on top of mapreduce. Large-Scale Distributed Systems for Information Retrieval, 15.
Valle, E., Cord, M., and Philipp-Foliguet, S. (2008). High-dimensional descriptor indexing for large multimedia databases. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 739–748. ACM.