Autoencoder-based feature extraction of spatial panel data for Brazilian agricultural heterogeneity cluster analysis

Flávio E. de O. Santos; Marcos A. S. da Silva; Leonardo N. Matos; Márcia H. G. Dompieri; Fábio R. de Moura

doi:10.5753/erbase.2022.228737

Flávio E. de O. Santos UFS
Marcos A. S. da Silva Embrapa
Leonardo N. Matos UFS
Márcia H. G. Dompieri Embrapa
Fábio R. de Moura UFS

DOI: https://doi.org/10.5753/erbase.2022.228737

Abstract

Brazilian agricultural production presents a high degree of spatial diversity, which challenges designing territorial public policies to promote sustainable development. This article proposes a new approach to cluster Brazilian municipalities according to their agricultural production. It combines a feature extraction mechanism using Deep Learning based on Autoencoders and clustering based on k-means and Self-Organizing Maps. We clustered the panel data from IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the Self-Organizing Maps and the k-means algorithm presented a better result than clustering the raw data using k-means. It demonstrated the ability of simple stacked autoencoders to reduce the dimensionality and create a new space of features in their latent layer where the data can be analyzed and clustered.

References

Berk, R. (2011). Asymmetric loss functions for forecasting in criminal justice settings. Journal of Quantitative Criminology, 27(1):107–123.

Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(PAMI-1):224–227.

Dress, K., Lessmann, S., and Mettenheim, H.-J. (2018). Residual value forecasting using asymmetric cost functions. International Journal of Forecasting, 34(4):551–565.

Du, G., Zhou, L., Yang, Y., Lü, K., and Wang, L. (2021). Deep multiple auto-encoder-based multi-view clustering. Data Science and Engineering, 6:323–338. DOI: 10.1007/s41019-021-00159-z.

Falissard, L., Faghreazzi, G., Howard, N., and Falissard, B. (2018). Deep clustering of longitudinal data. ArXiv.

Fatch, P., Masangano, C., Hilger, T., Jordan, I., Mambo, I., Francesca, J., Kamoto, M., Kalimbira, A., and Nuppenau, E.-A. (2021). Holistic agricultural diversity index as a measure of agricultural diversity: A cross-sectional study of smallholder farmers in Lilongwe district of Malawi. Agricultural Systems, 187:102991.

Genolini, C., Alacoque, X., Sentenac, M., and Arnaud, C. (2015). kml and kml3d: R packages to cluster longitudinal data. Journal of Statistical Software, 65(4):1–34.

Gupta, D., Hazarika, B. B., and Berlin, M. (2020). Robust regularized extreme learning machine with asymmetric huber loss function. Neural Computing and Applications, 32:12971–12998.

Halkidi, M. and Vazirgiannis, M. (2008). A density-based cluster validity approach using multi-representatives. Pattern Recognition Letters, 29:773–786.

Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101.

IBGE (2021). Tabelas 74, 94, 289, 291, 1612, 1613, 3939 e 3940: sistema IBGE de recuperação automática. Available at [link] (2021/06/15).

Khatun, N. and Matin, M. A. (2020). A study on linex loss function with different estimating methods. Open Journal of Statistics, 10:52–63.

Kohonen, T. (2001). Self-Organizing Maps. Berlin: Springer.

Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2):83–97.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.

Mohammed, M., Alshanbari, H. M., and El-Bagoury, A.-A. H. (2022). Application of the linex loss function with a fundamental derivation of liu estimator. Computational Intelligence and Neuroscience, (2307911):–. Artificial Intelligence and Machine Learning-Driven Decision-Making.

Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., and Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE TRANSACTIONS ON NEURAL NETWORKS, 12(2):181–201.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20:53–65.

Sales, C. and Rodrigues, R. (2019). Espaço rural brasileiro: diversificação e peculiaridades. Revista Espinhaço, 8(1):54–65.

Silva, M. A. S. d., Matos, L. N., Santos, F. E. d. O., Dompieri, M. H. G., and Moura, F. R. d. (2022). Tracking the connection between brazilian agricultural diversity and native vegetation change by a machine learning approach. IEEE Latin America Transactions, 20(11):2371–2380.

Song, C., Y, Y. H., Liu, F., Wang, Z., and Wang, L. (2014). Deep auto-encoder based clustering. Intelligent Data Analysis, 18(6):S65–S76. DOI: 10.3233/IDA-140709.

Teixeira, M. and Ribeiro, S. (2020). Agricultura e paisagens sustentáveis: a diversidade produtiva do setor agrícola de Minas Gerais, Brasil. Sustainability in Debate, 11(2):29–41.

Tenenbaum, J. B., de Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323.

Varian, H. R. (1975). A bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage, 5:195–208.

Xu, C., Dai, Y., Lin, R., and Wang, S. (2020). Deep clustering by maximizing mutual information in variational auto-encoder. Knowledge-Based Systems, 205(106260). DOI: 10.1016/j.knosys.2020.106260.