A Fast Word2Vec Implementation on Manycore Architectures for Text Representation and its Applications
Abstract
Word embedding has made possible to work with semantics in any application that works with a text document. Through algorithms that implement this technique, such as Word2Vec, it is possible to discover the similarity between words, paragraphs and even whole documents. However, generating word embedding still has a high computational cost. Some researches have been proposed parallel algorithms in recent years to deal with this problem, but gains in performance have ranged from 2 to 20 times compared to the original implementations. Manycore architectures have been able to scale algorithms in a more performative way. Since the accuracy of the word embeddings gener-ation depends on a large amount of data (Big Data), it is necessary that new scalable parallel algorithms are developed to deal with this large amount of data (billions of words). Scalable parallel algorithms development is one the most complex and difficult tasks so, in this work, we focus on exploiting parallelism in text representation with applications. We work with Word2Vec for text representation.
References
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022.
Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Ng, A. Y. (2012). Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1223–1231, USA. Curran Associates Inc.
Grave, E., Joulin, A., Cisse,´ M., Grangier, D., and Jegou,´ H. (2016). Efficient softmax approximation for gpus. CoRR, abs/1609.04309.
Gupta, S. and Khare, V. (2017). Blazingtext: Scaling and accelerating word2vec us-ing multiple gpus. In Proceedings of the Machine Learning on HPC Environments, MLHPC’17, pages 6:1–6:5, New York, NY, USA. ACM.
Harris, Z. (1954). Distributional structure. Word, 10(23):146–162.
Ji, S., Satish, N., Li, S., and Dubey, P. (2016). Parallelizing word2vec in shared and distributed memory.
Levy, O. and Goldberg, Y. (2014). Neural word embedding as implicit matrix factoriza-tion. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27, pages 2177– 2185. Curran Associates, Inc.
Liu, R. (2014). Unpublished work, retrieved from the author’s github web page https://libraries.io/github/fengchenhpc on 2018-08-15.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed repre-sentations of words and phrases and their compositionality. CoRR, abs/1310.4546.
Shen, Y., Tan, S., Pal, C. J., and Courville, A. C. (2017). Self-organized hierarchical softmax. CoRR, abs/1707.08588.
Shim, K., Lee, M., Choi, I., Boo, Y., and Sung, W. (2017). Svd-softmax: Fast softmax approximation on large vocabulary neural networks. In Guyon, I., Luxburg, U. V., Ben-gio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 5463–5473. Curran Associates, Inc.
Simonton, T. M. and Alaghband, G. (2017). Efficient and accurate word2vec imple-mentations in gpu and shared-memory multicore architectures. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–7.
Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., and Goldstein, T. (2016). Training neural networks without gradients: A scalable admm approach.
