DIP-BR: An Open and Network-Based Dataset of Brazilian Patents

Abstract


The gap between research institutions and industry leads to a significant challenge for technology transfer. Patent analysis is a strategic tool for unveiling collaborations between companies and research institutions. In this context, we introduce DIP-BR, an open research dataset of Brazilian intellectual property. This dataset is enriched by applying a deduplication algorithm to standardize names and using machine learning techniques to classify each patent holder as a research institution, company, or individual. The primary contribution of this work is the resulting enriched dataset, which is structured through network modeling and clustering. DIP-BR serves as a tool for analyzing trends and visualizing the dynamics of innovation in Brazil.

Keywords: Patent, Technology Transfer, Intellectual Property

References

Abbas, A., Zhang, L., and Khan, S. U. (2014). A literature review on the state-of-the-art in patent analysis. World Patent Information, 37:3–13.

Asitah, N., Purnomo, A., Young, M. N., Prasetyo, Y. T., Anam, F., Persada, S. F., and Kurniawan, B. K. (2024). Business analytics: A patent landscape retrospective mapping. Procedia Computer Science, 234:545–552.

Costa, B. M. G., da Silva Florencio, M. N., and de Oliveira Junior, A. M. (2018). Analysis of technological production in biotechnology in northeast brazil. World Patent Information, 52:42–49.

da Cruz, P. V., Lima, I. V. A., Seufitelli, D. B., Dalip, D. H., and de Campos, F. P. V. (2025). DIP-BR: Dataset of Intellectual Property in Brazil. Zenodo.

da Silveira, F., Ruppenthal, J. E., Lermen, F. H., Machado, F. M., and Amaral, F. G. (2021). Technologies used in agricultural machinery engines that contribute to the reduction of atmospheric emissions: A patent analysis in brazil. World Patent Information, 64:102023.

de Almeida Chaves, D. S., de Melo, G. O., and Corrêa, M. F. P. (2019). A review of recent patents regarding antithrombotic drugs derived from natural products. Studies in Natural Products Chemistry, 61:1–47.

Fujino, A. and Stal, E. (2007). Gestão da propriedade intelectual na universidade pública brasileira: diretrizes para licenciamento e comercialização. Revista de Negócios, 12(1):104–120.

Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).

Kim, G. and Bae, J. (2017). A novel approach to forecast promising technology through patent analysis. Technological Forecasting and Social Change, 117:228–237.

Matias, A. G. C., Pedreira, D. P., Costa, A. A. N. A., Sanatana, L. T. C., and Santana, V. E. C. (2020). Obtenção de patente e os aspectos do regime de copropriedade. Revista Brasileira Multidisciplinar, 23(1):202–213.

Rezende, N. G., Dalip, D. H., Brandão, M. A., and Vasconcelos, M. A. (2023). Elaboraçao de um conjunto de dados sobre o registro de patentes no brasil. In Dataset Showcase Workshop (DSW), pages 99–108. SBC.

Suzgun, M., Melas-Kyriazi, L., Sarkar, S., Kominers, S. D., and Shieber, S. (2023). The harvard uspto patent dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications. Advances in neural information processing systems, 36:57908–57946.

Trappey, C. V., Wu, H.-Y., Taghaboni-Dutta, F., and Trappey, A. J. (2011). Using patent data for technology forecasting: China rfid patent analysis. Advanced Engineering Informatics, 25(1):53–64.
Published
2025-09-29
CRUZ, Pablo Vasconcelos da; LIMA, Iuri V. A.; SEUFITELLI, Danilo B.; DALIP, Daniel H.; CAMPOS, Fabricio P. V. de. DIP-BR: An Open and Network-Based Dataset of Brazilian Patents. In: DATASET SHOWCASE WORKSHOP (DSW), 7. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 46-57. DOI: https://doi.org/10.5753/dsw.2025.248102.