Um motor de busca para séries temporais baseado em Teoria da Informação para Cidades Inteligentes
Resumo
Com a crescente digitalização urbana, data lakes são essenciais para armazenar e processar grandes volumes de dados em cidades inteligentes, mas sua governança complexa pode levá-los a se tornarem data swamps. Nesse cenário, a computação ubíqua surge como uma solução, permitindo o processamento contínuo e descentralizado desses dados em tempo real, facilitando a análise e a integração de informações dispersas em ambientes urbanos. Este trabalho propõe um sistema eficiente para identificar automaticamente séries temporais correlacionadas, combinando descritores baseados na Teoria da Informação e um banco de dados vetorial. Essa abordagem permite comparar séries de diferentes tamanhos de forma eficaz, reduzindo custos computacionais. Os resultados experimentais mostram que histogramas de padrões ordinais superam descritores estatísticos convencionais, comprovando a eficácia do método na busca por similaridade em ambientes de big data.Referências
Bandt, C. & Pompe, B. (2002), ‘Permutation entropy: A natural complexity measure for time series’, Phys. Rev. Lett. 88, 174102. [link]
Bhattacharyya, A. (1943), ‘On a measure of divergence between two statistical populations defined by their probability distributions’, Bull. Calcutta math. Soc. 35, 99–109. [link]
Fernandes, D., L. L. Moura, D., Santos, G., S. Ramos, G., Queiroz, F. & L. L. Aquino, A. (2023), Towards edge-based data lake architecture for intelligent transportation system, in ‘Proceedings of the Int’l ACM Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous Networks’, MSWiM ’23, ACM, New York, NY, USA, p. 1–8. DOI: 10.1145/3616394.3618270
Fernandes, D., Ramos, G. S., Pinheiro, R. G. & Aquino, A. L. (2024), ‘A multi-start simulated annealing strategy for data lake organization problem’, Applied Soft Computing 160, 111700. DOI: 10.1016/j.asoc.2024.111700
Gorelik, A. (2016), The Enterprise Big Data Lake, O’Reilly Media, Sebastopol, CA, USA.
Grzegorowski, M., Zdravevski, E., Janusz, A., Lameski, P., Apanowicz, C. & Ślezak, D. (2021), ‘Cost optimization for big data workloads based on dynamic scheduling and cluster-size tuning’, Big Data Research 25, 100203. [link]
Hai, R., Koutras, C., Quix, C. & Jarke, M. (2023), ‘Data lakes: A survey of functions and systems’, IEEE Transactions on Knowledge and Data Engineering 35(12), 12571–12590. DOI: 10.1109/TKDE.2023.3270101
Martínez-Durive, O. E., Mishra, S., Ziemlicki, C., Rubrichi, S., Smoreda, Z. & Fiore, M. (2023), ‘The netmob23 dataset: A high-resolution multi-region service-level mobile data traffic cartography’.
Pan, J. J., Wang, J. & Li, G. (2024), ‘Survey of vector database management systems’, The VLDB Journal 33(5), 1591–1615. DOI: 10.1007/s00778-024-00864-x
Pessa, A. A. B. & Ribeiro, H. V. (2021), ‘ordpy: A python package for data analysis with permutation entropy and ordinal network methods’, Chaos: An Interdisciplinary Journal of Nonlinear Science 31(6). DOI: 10.1063/5.0049901
Ramos, G. S., Fernandes, D., Coelho, J. A. P. d. M. & Aquino, A. L. L. (2023), Toward Data Lake Technologies for Intelligent Societies and Cities, Springer International Publishing, Cham, pp. 3–29.
Saeedan, M. & Eldawy, A. (2022), Spatial parquet: a column file format for geospatial data lakes, in ‘Proceedings of the 30th International Conference on Advances in Geographic Information Systems’, SIGSPATIAL ’22, ACM, p. 1–4. DOI: 10.1145/3557915.3561038
Sawadogo, P. & Darmont, J. (2020), ‘On data lake architectures and metadata management’, Journal of Intelligent Information Systems 56(1), 97–120. DOI: 10.1007/s10844-020-00608-7
Tang, X., Liu, W., Wu, S., Yao, C., Yuan, G., Ying, S. & Chen, G. (2025), ‘Queryartisan: Generating data manipulation codes for ad-hoc analysis in data lakes’, Proc. VLDB Endow. 18(2), 108–116. DOI: 10.14778/3705829.3705832
Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou, Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu, J., Jiang, R., Wei, Y. & Xie, C. (2021), ‘Milvus: A purpose-built vector data management system’, Proceedings of the 2021 International Conference on Management of Data. [link]
Weng, S., Tan, W., Ou, B. & Pan, J.-S. (2021), ‘Reversible data hiding method for multi-histogram point selection based on improved crisscross optimization algorithm’, Information Sciences 549, 13–33. [link]
Yu, H., Cai, H., Liu, Z., Xu, B. & Jiang, L. (2022), ‘An automated metadata generation method for data lake of industrial wot applications’, IEEE Transactions on Systems, Man, and Cybernetics: Systems 52(8), 5235–5248.
Bhattacharyya, A. (1943), ‘On a measure of divergence between two statistical populations defined by their probability distributions’, Bull. Calcutta math. Soc. 35, 99–109. [link]
Fernandes, D., L. L. Moura, D., Santos, G., S. Ramos, G., Queiroz, F. & L. L. Aquino, A. (2023), Towards edge-based data lake architecture for intelligent transportation system, in ‘Proceedings of the Int’l ACM Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous Networks’, MSWiM ’23, ACM, New York, NY, USA, p. 1–8. DOI: 10.1145/3616394.3618270
Fernandes, D., Ramos, G. S., Pinheiro, R. G. & Aquino, A. L. (2024), ‘A multi-start simulated annealing strategy for data lake organization problem’, Applied Soft Computing 160, 111700. DOI: 10.1016/j.asoc.2024.111700
Gorelik, A. (2016), The Enterprise Big Data Lake, O’Reilly Media, Sebastopol, CA, USA.
Grzegorowski, M., Zdravevski, E., Janusz, A., Lameski, P., Apanowicz, C. & Ślezak, D. (2021), ‘Cost optimization for big data workloads based on dynamic scheduling and cluster-size tuning’, Big Data Research 25, 100203. [link]
Hai, R., Koutras, C., Quix, C. & Jarke, M. (2023), ‘Data lakes: A survey of functions and systems’, IEEE Transactions on Knowledge and Data Engineering 35(12), 12571–12590. DOI: 10.1109/TKDE.2023.3270101
Martínez-Durive, O. E., Mishra, S., Ziemlicki, C., Rubrichi, S., Smoreda, Z. & Fiore, M. (2023), ‘The netmob23 dataset: A high-resolution multi-region service-level mobile data traffic cartography’.
Pan, J. J., Wang, J. & Li, G. (2024), ‘Survey of vector database management systems’, The VLDB Journal 33(5), 1591–1615. DOI: 10.1007/s00778-024-00864-x
Pessa, A. A. B. & Ribeiro, H. V. (2021), ‘ordpy: A python package for data analysis with permutation entropy and ordinal network methods’, Chaos: An Interdisciplinary Journal of Nonlinear Science 31(6). DOI: 10.1063/5.0049901
Ramos, G. S., Fernandes, D., Coelho, J. A. P. d. M. & Aquino, A. L. L. (2023), Toward Data Lake Technologies for Intelligent Societies and Cities, Springer International Publishing, Cham, pp. 3–29.
Saeedan, M. & Eldawy, A. (2022), Spatial parquet: a column file format for geospatial data lakes, in ‘Proceedings of the 30th International Conference on Advances in Geographic Information Systems’, SIGSPATIAL ’22, ACM, p. 1–4. DOI: 10.1145/3557915.3561038
Sawadogo, P. & Darmont, J. (2020), ‘On data lake architectures and metadata management’, Journal of Intelligent Information Systems 56(1), 97–120. DOI: 10.1007/s10844-020-00608-7
Tang, X., Liu, W., Wu, S., Yao, C., Yuan, G., Ying, S. & Chen, G. (2025), ‘Queryartisan: Generating data manipulation codes for ad-hoc analysis in data lakes’, Proc. VLDB Endow. 18(2), 108–116. DOI: 10.14778/3705829.3705832
Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou, Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu, J., Jiang, R., Wei, Y. & Xie, C. (2021), ‘Milvus: A purpose-built vector data management system’, Proceedings of the 2021 International Conference on Management of Data. [link]
Weng, S., Tan, W., Ou, B. & Pan, J.-S. (2021), ‘Reversible data hiding method for multi-histogram point selection based on improved crisscross optimization algorithm’, Information Sciences 549, 13–33. [link]
Yu, H., Cai, H., Liu, Z., Xu, B. & Jiang, L. (2022), ‘An automated metadata generation method for data lake of industrial wot applications’, IEEE Transactions on Systems, Man, and Cybernetics: Systems 52(8), 5235–5248.
Publicado
20/07/2025
Como Citar
SANTOS, Jordan A.; FERNANDES, Danilo; AQUINO, Andre L. L..
Um motor de busca para séries temporais baseado em Teoria da Informação para Cidades Inteligentes. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO UBÍQUA E PERVASIVA (SBCUP), 17. , 2025, Maceió/AL.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 81-90.
ISSN 2595-6183.
DOI: https://doi.org/10.5753/sbcup.2025.8946.
