A Fast Word2Vec Implementation on Manycore Architectures for Text Representation and its Applications

Leonardo A. Amorim; Mateus F. Freitas; Chayner C. Barros; Wellington Santos Martins

Leonardo A. Amorim UFG
Mateus F. Freitas UFG
Chayner C. Barros UFG
Wellington Santos Martins UFG

Resumo

A incorporação de palavras tornou possível trabalhar com semântica em qualquer aplicativo que trabalhe com um documento de texto. Por meio de algoritmos que implementam essa técnica, como o Word2Vec, é possível descobrir a semelhança entre palavras, parágrafos e até documentos inteiros. No entanto, a geração de incorporação de palavras ainda tem um alto custo computacional. Algumas pesquisas têm proposto algoritmos paralelos nos últimos anos para lidar com esse problema, mas os ganhos de desempenho variaram de 2 a 20 vezes em comparação com as implementações originais. As arquiteturas Manycore conseguiram escalar algoritmos de uma forma mais performativa. Como a precisão da palavra incorporação depende de uma grande quantidade de dados (Big Data), é necessário que novos algoritmos paralelos escalonáveis sejam desenvolvidos para lidar com essa grande quantidade de dados (bilhões de palavras). O desenvolvimento de algoritmos paralelos escalonáveis é uma das tarefas mais complexas e difíceis, por isso, neste trabalho, nos concentramos em explorar o paralelismo na representação de texto com aplicativos. Trabalhamos com o Word2Vec para representação de texto.

Palavras-chave: Incorporação de Palavras, Word2Vec, Algoritmos Paralelos, Manycore, Big Data.

Referências

Amorim, L. and et al (2018). A new word embedding approach to evaluate potential fixes for automated program repair. In The International Joint Conference on Neural Networks. IEEE 2018.

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022.

Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Ng, A. Y. (2012). Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1223–1231, USA. Curran Associates Inc.

Grave, E., Joulin, A., Cisse,´ M., Grangier, D., and Jegou,´ H. (2016). Efficient softmax approximation for gpus. CoRR, abs/1609.04309.

Gupta, S. and Khare, V. (2017). Blazingtext: Scaling and accelerating word2vec us-ing multiple gpus. In Proceedings of the Machine Learning on HPC Environments, MLHPC’17, pages 6:1–6:5, New York, NY, USA. ACM.

Harris, Z. (1954). Distributional structure. Word, 10(23):146–162.

Ji, S., Satish, N., Li, S., and Dubey, P. (2016). Parallelizing word2vec in shared and distributed memory.

Levy, O. and Goldberg, Y. (2014). Neural word embedding as implicit matrix factoriza-tion. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27, pages 2177– 2185. Curran Associates, Inc.

Liu, R. (2014). Unpublished work, retrieved from the author’s github web page https://libraries.io/github/fengchenhpc on 2018-08-15.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed repre-sentations of words and phrases and their compositionality. CoRR, abs/1310.4546.

Shen, Y., Tan, S., Pal, C. J., and Courville, A. C. (2017). Self-organized hierarchical softmax. CoRR, abs/1707.08588.

Shim, K., Lee, M., Choi, I., Boo, Y., and Sung, W. (2017). Svd-softmax: Fast softmax approximation on large vocabulary neural networks. In Guyon, I., Luxburg, U. V., Ben-gio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 5463–5473. Curran Associates, Inc.

Simonton, T. M. and Alaghband, G. (2017). Efficient and accurate word2vec imple-mentations in gpu and shared-memory multicore architectures. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–7.

Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., and Goldstein, T. (2016). Training neural networks without gradients: A scalable admm approach.