K-Nearest Neighbors based on the Nk Interaction Graph

Gustavo F. C. de Castro; Renato Tinós

doi:10.5753/eniac.2022.227174

Gustavo F. C. de Castro USP
Renato Tinós USP

DOI: https://doi.org/10.5753/eniac.2022.227174

Resumo

The K-Nearest Neighbors (KNN) is a simple and intuitive nonparametric classification algorithm. In KNN, the K nearest neighbors are determined according to the distance to the example to be classified. Generally, the Euclidean distance is used, which facilitates the formation of hyper-ellipsoid clusters. In this work, we propose using the Nk interaction graph to return the K-nearest neighbors in KNN. The Nk interaction graph, originally used in clustering, is built based on the distance between examples and spatial density in small groups formed by k examples of the training dataset. By using the distance combined with the spatial density, it is possible to form clusters with arbitrary shapes. We propose two variations of the KNN based on the Nk interaction graph. They differ in the way in which the vertices associated with the N examples of the training dataset are visited. The two proposed algorithms are compared to the original KNN in experiments with datasets with different properties.

Referências

ALTMAN, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression", The American Statistician, 46(3): 175-185.

AHA, D. W.; KIBLER, D. & ALBERT, M.K. (1991). "Instance-based learning algorithms", Machine Learning, 6(1): 37-66.

DUA, D. & GRAFF, C. (2019). "UCI Machine Learning Repository", [http://archive.ics.uci.edu/ml], Irvine, CA: University of California, School of Information and Computer Science.

ESTER, M.; KRIEGEL, H.-P.; SANDER, J. & XU, X. (1996), "A density-based algorithm for discovering clusters in large spatial databases with noise", In the Proc. of the 2nd ACM Int. Conf. Knowl. Discovery Data Min. (KDD), 226-231.

FIX, E. (1985). "Discriminatory analysis: nonparametric discrimination, consistency properties", Technical Report, USAF School of Aviation Medicine.

FRÄNTI, P. & SIERANOJA, S. (2018). "K-means properties on six clustering benchmark datasets", Applied Intelligence, 48 (12): 4743-4759.

KOTSIANTIS, S. B.; ZAHARAKIS, I. & PINTELAS, P. (2007). "Supervised machine learning: A review of classification techniques", Emerging artificial intelligence applications in computer engineering, 160(1): 3-24.

MACQUEEN, J. (1967) Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297.

MORAES, J. C. B. (2020). "Busca por similaridade utilizando grafo de interações NK", Dissertação de Mestrado em Computação Aplicada, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo.

MORAES, J. C. B., & TINÓS, R. (2020). "Busca por Similaridade usando o Grafo de Interação NK", Nos Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional, 222-233.

RODRIGUEZ, A. & LAIO, A. (2014). "Clustering by fast search and find of density peaks," Science, 344(6191): 1492-1496.

TINÓS, R.; ZHAO, L.; CHICANO, F. & WHITLEY, D. (2018). "NK hybrid genetic algorithm for clustering", IEEE Transactions on Evolutionary Computation, 22(5): 748-761.

K-Nearest Neighbors based on the Nk Interaction Graph

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)