An empirical comparison of distance/similarity measures for Natural Language Processing
Text Classification is one of the tasks of Natural Language Processing (NLP). In this area, Graph Convolutional Networks (GCN) has achieved values higher than CNN's and other related models. For GCN, the metric that defines the correlation between words in a vector space plays a crucial role in the classification because it determines the weight of the edges between two words (represented by nodes in the graph). In this study, we empirically investigated the impact of thirteen measures of distance/similarity. A representation was built for each document using word embedding from word2vec model. Also, a graph-based representation of five dataset was created for each measure analyzed, where each word is a node in the graph, and each edge is weighted by distance/similarity between words. Finally, each model was run in a simple graph neural network. The results show that, concerning text classification, there is no statistical difference between the analyzed metrics and the Graph Convolution Network. Even with the incorporation of external words or external knowledge, the results were similar to the methods without the incorporation of words. However, the results indicate that some distance metrics behave better than others in relation to context capture, with Euclidean distance reaching the best values or having statistical similarity with the best.
[Berger and Lafferty 2017] Berger, A. L. and Lafferty, J. D. (2017). Information retrieval as statistical translation. SIGIR Forum, 51(2):219–226.
[Cetoli et al. 2017] Cetoli, A., Bragaglia, S., O’Harney, A. D., and Sloan, M. (2017). Graph convolutional networks for named entity recognition. CoRR, abs/1709.10053.
[Cha 2007] Cha, S.-H. (2007). Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2):1.
[Collobert et al. 2011] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. P. (2011). Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537.
[Faruqui et al. 2015] Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E. H., and Smith, N. A. (2015). Retrofitting word vectors to semantic lexicons. In Mihalcea, R., Chai, J. Y., and Sarkar, A., editors, NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31 - June 5, 2015, pages 1606– 1615. The Association for Computational Linguistics.
[Goldberg 2016] Goldberg, Y. (2016). A primer on neural network models for natural language processing. J. Artif. Intell. Res., 57:345–420.
[Goldberg and Levy 2014] Goldberg, Y. and Levy, O. (2014). word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method. CoRR, abs/1402.3722.
[Kalchbrenner et al. 2014] Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014). A convolutional neural network for modelling sentences. CoRR, abs/1404.2188.
[Kim 2014] Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1746–1751.
[Kipf and Welling 2016] Kipf, T. N. and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907.
[Kruskal and Wallis 1952] Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American statistical Association, 47(260):583–621.
[Le and Mikolov 2014] Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1188–1196.
[Liu et al. 2016] Liu, P., Qiu, X., and Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 2873–2879.
[Lovins 1968] Lovins, J. B. (1968). Development of a stemming algorithm. Mech. Translat. & Comp. Linguistics, 11(1-2):22–31.
[Luo 2017] Luo, Y. (2017). Recurrent neural networks for classifying relations in clinical notes. Journal of Biomedical Informatics, 72:85–95.
[Marcheggiani and Titov 2017] Marcheggiani, D. and Titov, I. (2017). Encoding sentences with graph convolutional networks for semantic role labeling. In Palmer, M., Hwa, R., and Riedel, S., editors, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 1506–1515. Association for Computational Linguistics.
[Melamud et al. 2016] Melamud, O., McClosky, D., Patwardhan, S., and Bansal, M. (2016). The role of context types and dimensionality in learning word embeddings. CoRR, abs/1601.00893.
[Mikolov et al. 2013] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings.
[Nair and Hinton 2010] Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 807–814.
[Salton and Buckley 1988] Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513–523.
[Socher et al. 2013] Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1631–1642.
[Tai et al. 2015] Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. CoRR, abs/1503.00075.
[Turney and Pantel 2010] Turney, P. D. and Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res., 37:141–188.
[Yao et al. 2018] Yao, L., Mao, C., and Luo, Y. (2018). Graph convolutional networks for text classification. volume abs/1809.05679.
[Young et al. 2018] Young, T., Hazarika, D., Poria, S., and Cambria, E. (2018). Recent trends in deep learning based natural language processing [review article]. IEEE Comp. Int. Mag., 13(3):55–75.