Graph Pattern Mining: consolidating models, systems, and abstractions

Resumo


Este texto resume as contribuições da tese intitulada ”Mineração de Padrões em Grafos: consolidando modelos, sistemas e abstrações”, aprovada no Programa de Pós-Graduação em Ciência da Computação da Universidade Federal de Minas Gerais (DCC/UFMG). Mineração de Padrões em Grafos (Graph Pattern Mining, GPM) refere-se a uma classe de problemas que envolve o processamento de subgrafos extraídos de grafos maiores. As aplicações para algoritmos de GPM incluem a consulta de subgrafos com propriedades específicas de interesse, a identificação de estruturas de motivos em redes biológicas, entre outras. Os algoritmos de GPM são desafiadores de desenvolver e, assim, sistemas de GPM de uso geral surgem como uma solução para melhorar a experiência do usuário com tais algoritmos. Nesta tese, propomos um modelo baseado em primitivas para representar algoritmos de GPM, um sistema distribuído que implementa este modelo, e um extenso estudo experimental de algoritmos populares usados em sistemas de GPM. Demonstramos empiricamente a eficácia do modelo proposto, mostrando um desempenho competitivo em relação aos concorrentes, sem sacrificar a expressividade dos algoritmos.

Palavras-chave: data mining, graph mining, frequent subgraph mining, parallel and distributed systems, spark, big data

Referências

Agrawal, M., Zitnik, M., and Leskovec, J. (2018). Large-scale analysis of disease pathways in the human interactome. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 23:111–122.

Benson, A. R., Gleich, D. F., and Leskovec, J. (2016). Higher-order organization of complex networks. Science.

Bindschaedler, L., Malicevic, J., Lepers, B., Goel, A., and Zwaenepoel, W. (2021). Tesseract: Distributed, General Graph Pattern Mining on Evolving Graphs, page 458–473. Association for Computing Machinery, New York, NY, USA.

Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., and Cheng, J. (2018). G-miner: An efficient task-oriented graph mining system. In Proceedings of the Thirteenth EuroSys Conference, EuroSys ’18.

Chen, X. and Arvind (2022). Efficient and scalable graph pattern mining on GPUs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 857–877, Carlsbad, CA. USENIX Association.

Dias, V., Ferraz, S., Vadlamani, A., Erfanian, M., Teixeira, C. H., Guedes, D., Meira, W., and Parthasarathy, S. (2023). Graph pattern mining paradigms: Consolidation and renewed bearing. In 2023 31st IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC).

Dias, V., Teixeira, C. H. C., Guedes, D., Meira Jr., W., and Parthasarathy, S. (2019). Fractal: A general-purpose graph pattern mining system. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD).

Elbassuoni, S. and Blanco, R. (2011). Keyword search over rdf graphs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 237–242, New York, NY, USA. ACM.

Elseidy, M., Abdelhamid, E., Skiadopoulos, S., and Kalnis, P. (2014). Grami: Frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endow., 7(7):517–528.

Hooi, B., Shin, K., Lamba, H., and Faloutsos, C. (2020). Telltail: Fast scoring and detection of dense subgraphs. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):4150–4157.

Jamshidi, K., Mahadasa, R., and Vora, K. (2020). Peregrine: A pattern-aware graph mining system. In Proceedings of the Fifteenth European Conference on Computer Systems, EuroSys ’20, New York, NY, USA. Association for Computing Machinery.

Leon-Suematsu, Y. I., Inui, K., Kurohashi, S., and Kidawara, Y. (2011). Web Spam Detection by Exploring Densely Connected Subgraphs. In 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, volume 1, pages 124–129.

Mawhirter, D. and Wu, B. (2019). Automine: Harmonizing high-level abstraction and high performance for graph mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, pages 509–523, New York, NY, USA. ACM.

Meng, C., Mouli, S. C., Ribeiro, B., and Neville, J. (2018). Subgraph pattern neural networks for high-order graph evolution prediction.

Qin, H., Li, R.-H., Wang, G., Qin, L., Cheng, Y., and Yuan, Y. (2019). Mining periodic cliques in temporal networks. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1130–1141.

Teixeira, C. H. C., Fonseca, A. J., Serafini, M., Siganos, G., Zaki, M. J., and Aboulnaga, A. (2015). Arabesque: A system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP ’15, pages 425–440.

Ugander, J., Backstrom, L., and Kleinberg, J. (2013). Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In WWW.

Yang, Y., Yan, D., Wu, H., Cheng, J., Zhou, S., and Lui, J. C. (2016). Diversified temporal subgraph pattern mining. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1965–1974, New York, NY, USA. Association for Computing Machinery.

Zhao, H., Zhou, Y., Song, Y., and Lee, D. L. (2019). Motif enhanced recommendation over heterogeneous information network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, pages 2189–2192, New York, NY, USA. ACM.
Publicado
14/10/2024
DIAS, Vinícius; GUEDES, Dorgival. Graph Pattern Mining: consolidating models, systems, and abstractions. In: CONCURSO DE TESES E DISSERTAÇÕES (CTDBD) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 190-195. DOI: https://doi.org/10.5753/sbbd_estendido.2024.240515.