Recuperação Semântica Descentralizada de Documentos Científicos em Redes P2P: um Desenho de Pesquisa
Resumo
Este artigo apresenta o desenho de um projeto de mestrado em andamento na UDESC sobre uma arquitetura P2P descentralizada para persistência e recuperação semântica de documentos científicos. A proposta combina um overlay DHT para descoberta de pares e roteamento base com um plano de dados em QUIC e erasure coding para armazenamento tolerante a falhas. Sobre essa infraestrutura, investiga-se roteamento semântico distribuído em vizinhanças aproximadas, permitindo busca por similaridade sem índice global sob churn. Experimentos iniciais fornecem evidência de viabilidade.Referências
Bagstad, K. J., Balbi, S., Adamo, G., Athanasiadis, I. N., Affinito, F., Willcock, S., Magrach, A., Hayashi, K., Harmáčková, Z. V., Niamir, A., Smets, B., Buchhorn, M., Drakou, E. G., Alfieri, A., Edens, B., Morales, L. G., Ágnes Vári, Sanz, M.-J., and Villa, F. (2025). Interoperability for ecosystem service assessments: Why, how, who, and for whom? Ecosystem Services, 72:101705.
Balaji, S. B., Krishnan, M. N., Vajha, M., et al. (2018). Erasure coding for distributed storage: an overview. Science China Information Sciences, 61(10):100301.
Bozada, T., Borden, J., Workman, J., Del Cid, M., Malinowski, J., and Luechtefeld, T. (2021). Sysrev: A fair platform for data curation and systematic evidence review. Frontiers in Artificial Intelligence, Volume 4 - 2021.
Bruch, S., Nardini, F. M., Rulli, C., and Venturini, R. (2024). Efficient inverted indexes for approximate retrieval over learned sparse representations. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 152–162, New York, NY, USA. Association for Computing Machinery.
Dutta, P., Guerraoui, R., and Levy, R. R. (2008). Optimistic erasure-coded distributed storage. In Proceedings of the 22nd International Symposium on Distributed Computing, DISC ’08, page 182–196, Berlin, Heidelberg. Springer-Verlag.
Giatsoglou, N., Krasanakis, E., Papadopoulos, S., and Kompatsiaris, I. (2022). A graph diffusion scheme for decentralized content search based on personalized page-rank. In 2022 IEEE 42nd International Conference on Distributed Computing Systems Workshops (ICDCSW), pages 53–59.
Hevner, A. R., March, S. T., Park, J., and Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1):75–105.
Li, J., Loo, B. T., Hellerstein, J. M., Kaashoek, M. F., Karger, D. R., and Morris, R. (2003). On the feasibility of peer-to-peer web indexing and search. In Kaashoek, M. F. and Stoica, I., editors, Peer-to-Peer Systems II, pages 207–215, Berlin, Heidelberg. Springer Berlin Heidelberg.
Li, Z. and Xiao, C. (2021). Er-store: A hybrid storage mechanism with erasure coding and replication in distributed database systems. Scientific Programming, 2021(1):9910942.
Maymounkov, P. and Mazières, D. (2002). Kademlia: A peer-to-peer information system based on the xor metric. In Druschel, P., Kaashoek, F., and Rowstron, A., editors, Peer-to-Peer Systems, pages 53–65, Berlin, Heidelberg. Springer Berlin Heidelberg.
Ortega, V. and Monserrat, J. F. (2020). Semantic distributed data for vehicular networks using the inter-planetary file system. Sensors, 20(22).
Papapetrou, O., Siberski, W., Balke, W.-T., and Nejdl, W. (2007). Dhts over peer clusters for distributed information retrieval. In 21st International Conference on Advanced Information Networking and Applications (AINA ’07), pages 84–93.
Qiao, Y., Zhang, M., Zhou, Y., Kong, X., Zhang, H., Xu, M., Bi, J., and Wang, J. (2022). Netec: Accelerating erasure coding reconstruction with in-network aggregation. IEEE Transactions on Parallel and Distributed Systems, 33(10):2571–2583.
Soiland-Reyes, S., Goble, C., and Groth, P. (2024). Evaluating FAIR digital object and linked data as distributed object systems. PeerJ Computer Science, 10:e1781.
Vectara (2023). Open-rag-bench dataset. [link]. Accessed: 2025-12-05.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1):160018.
Xu, B., Huang, J., Qin, X., and Cao, Q. (2020). Traffic-aware erasure-coded archival schemes for in-memory stores. IEEE Transactions on Parallel and Distributed Systems, 31(12):2938–2953.
Zhang, H., Wen, Y., Xie, H., and Yu, N. (2013). Distributed Hash Table: Theory, Platforms and Applications. SpringerBriefs in Computer Science. Springer New York, New York, NY.
Zhu, Y. and Hu, Y. (2007). Efficient semantic search on dht overlays. Journal of Parallel and Distributed Computing, 67(5):604–616.
Balaji, S. B., Krishnan, M. N., Vajha, M., et al. (2018). Erasure coding for distributed storage: an overview. Science China Information Sciences, 61(10):100301.
Bozada, T., Borden, J., Workman, J., Del Cid, M., Malinowski, J., and Luechtefeld, T. (2021). Sysrev: A fair platform for data curation and systematic evidence review. Frontiers in Artificial Intelligence, Volume 4 - 2021.
Bruch, S., Nardini, F. M., Rulli, C., and Venturini, R. (2024). Efficient inverted indexes for approximate retrieval over learned sparse representations. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 152–162, New York, NY, USA. Association for Computing Machinery.
Dutta, P., Guerraoui, R., and Levy, R. R. (2008). Optimistic erasure-coded distributed storage. In Proceedings of the 22nd International Symposium on Distributed Computing, DISC ’08, page 182–196, Berlin, Heidelberg. Springer-Verlag.
Giatsoglou, N., Krasanakis, E., Papadopoulos, S., and Kompatsiaris, I. (2022). A graph diffusion scheme for decentralized content search based on personalized page-rank. In 2022 IEEE 42nd International Conference on Distributed Computing Systems Workshops (ICDCSW), pages 53–59.
Hevner, A. R., March, S. T., Park, J., and Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1):75–105.
Li, J., Loo, B. T., Hellerstein, J. M., Kaashoek, M. F., Karger, D. R., and Morris, R. (2003). On the feasibility of peer-to-peer web indexing and search. In Kaashoek, M. F. and Stoica, I., editors, Peer-to-Peer Systems II, pages 207–215, Berlin, Heidelberg. Springer Berlin Heidelberg.
Li, Z. and Xiao, C. (2021). Er-store: A hybrid storage mechanism with erasure coding and replication in distributed database systems. Scientific Programming, 2021(1):9910942.
Maymounkov, P. and Mazières, D. (2002). Kademlia: A peer-to-peer information system based on the xor metric. In Druschel, P., Kaashoek, F., and Rowstron, A., editors, Peer-to-Peer Systems, pages 53–65, Berlin, Heidelberg. Springer Berlin Heidelberg.
Ortega, V. and Monserrat, J. F. (2020). Semantic distributed data for vehicular networks using the inter-planetary file system. Sensors, 20(22).
Papapetrou, O., Siberski, W., Balke, W.-T., and Nejdl, W. (2007). Dhts over peer clusters for distributed information retrieval. In 21st International Conference on Advanced Information Networking and Applications (AINA ’07), pages 84–93.
Qiao, Y., Zhang, M., Zhou, Y., Kong, X., Zhang, H., Xu, M., Bi, J., and Wang, J. (2022). Netec: Accelerating erasure coding reconstruction with in-network aggregation. IEEE Transactions on Parallel and Distributed Systems, 33(10):2571–2583.
Soiland-Reyes, S., Goble, C., and Groth, P. (2024). Evaluating FAIR digital object and linked data as distributed object systems. PeerJ Computer Science, 10:e1781.
Vectara (2023). Open-rag-bench dataset. [link]. Accessed: 2025-12-05.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1):160018.
Xu, B., Huang, J., Qin, X., and Cao, Q. (2020). Traffic-aware erasure-coded archival schemes for in-memory stores. IEEE Transactions on Parallel and Distributed Systems, 31(12):2938–2953.
Zhang, H., Wen, Y., Xie, H., and Yu, N. (2013). Distributed Hash Table: Theory, Platforms and Applications. SpringerBriefs in Computer Science. Springer New York, New York, NY.
Zhu, Y. and Hu, Y. (2007). Efficient semantic search on dht overlays. Journal of Parallel and Distributed Computing, 67(5):604–616.
Publicado
25/05/2026
Como Citar
GIONGO, Carlos; FIORESE, Adriano.
Recuperação Semântica Descentralizada de Documentos Científicos em Redes P2P: um Desenho de Pesquisa. In: TRILHA DE NOVAS IDEIAS E RESULTADOS EMERGENTES EM SI - DESENHOS DE PESQUISA - SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 22. , 2026, Vitória/ES.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 147-153.
DOI: https://doi.org/10.5753/sbsi_estendido.2026.249006.
