Generating Links for Patent Documents: an Automatic Approach using Computational Intelligence


  • C. M. Souza Pontifical Catholic University of Minas Gerais
  • M. E. Santos Pontifical Catholic University of Minas Gerais
  • M. R. G. Meireles Pontifical Catholic University of Minas Gerais



Word Sense Disambiguation, Keywords Extraction, Link creation, Patents


Patents are organized into classification systems, which assist offices and users in the process of seeking and retrieving such documents. A wide variety of users use the patent systems and the information contained in these documents. However, patents are complex legal documents with a significant number of technical and descriptive details, which makes it difficult to identify and analyze the information contained in these documents. An automatic link system associated with some of the terms found in the patents would provide quick access to the concepts contained in specific knowledge bases. This work presents results of a project in which the objective is the automatic generation of links in patent documents. The experiments were conducted with four subgroups of the United States Patent and Trademark Office (USPTO), which uses the Cooperative Patent Classification (CPC) system. As the patent documents did not have keywords, the meaningful terms were selected using the algorithm χ2, for which the contents of the entire patent document were used. Some keywords with more than one meaning were disambiguated using a specific algorithm, generating a file with useful information used in the experiments. The links were generated based on Wikipedia articles and the USPTO patent database. The use of the patent database as a possible destination for the link is intended to cover cases in which Wikipedia has no articles on certain terms and also to provide an alternative source that may assist readers in understanding those documents. It is expected, with the creation of automated links, to make it easier to access concepts related to the terms presented by the documents and to understand the information disclosed by the inventors.


Download data is not yet available.


Codina-Filbà, J., Bouayad-Agha, N., Burga, A., Casamayor, G., Mille, S., Müller, A., Saggion, H., and Wanner, L. Using genre-specific features for patent summaries. Information Processing & Management 53 (1): 151–174, 2017.

Corrêa Jr, E. A., Lopes, A. A., and Amancio, D. R. Word sense disambiguation: A complex network approach. Information Sciences vol. 442-443, pp. 103–113, 2018.

Cucerzan, S. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). pp. 708–716, 2007.

Duari, S. and Bhatnagar, V. scake: Semantic connectivity aware keyword extraction. Information Sciences vol. 477, pp. 100–117, 2019.

Durham, A. L. Patent law essentials: A concise guide. ABC-CLIO, 2018.

Erbs, N., Zesch, T., and Gurevych, I. Link discovery: A comprehensive analysis. In 2011 IEEE Fifth International Conference on Semantic Computing. IEEE, pp. 83–86, 2011.

Gardner, J. J. and Xiong, L. Automatic link detection: A sequence labeling approach. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. CIKM ’09. ACM, New York, NY, USA, pp. 1701–1704, 2009.

Han, X., Sun, L., and Zhao, J. Collective entity linking in web text: a graph-based method. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, pp. 765–774, 2011.

Jana, A., Mooriyath, S., Mukherjee, A., and Goyal, P. Wikim: metapaths based wikification of scientific abstracts. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, pp. 1–10, 2017.

Khode, A. and Jambhorkar, S. A literature review on patent information retrieval techniques. Indian Journal of Science & Technology 10 (37), 2017.

Manning, C. D. and Schütze, H. Foundations of statistical natural language processing. MIT press, 1999.

Meireles, M. R. G., Ferraro, G., and Geva, S. Classification and information management for patent collections: a literature review and some research questions. Information Research 21 (1), 2016.

Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. CIKM ’07. ACM, pp. 233–242, 2007.

Onan, A., Korukoğlu, S., and Bulut, H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications vol. 57, pp. 232–247, 2016.

Ouellette, L. L. Who reads patents? Nature biotechnology 35 (5): 421–424, 2017.

Panchenko, A., Ruppert, E., Faralli, S., Ponzetto, S. P., and Biemann, C. Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 1. pp. 86–98, 2017.

Ratinov, L., Roth, D., Downey, D., and Anderson, M. Local and global algorithms for disambiguation to wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, pp. 1375–1384, 2011.

Reginaldo, T. V., Lucindo, D. L. B., Meireles, M. R. G., Patrocínio Júnior, Z. K. G., and Almeida, P. E. M. A comparison of algorithms for the extraction of keywords in a patent database. Proceedings of the XXXVIII Iberian Latin-American Congress on Computational Methods in Engineering, 2017.

Seneviratne, D. Patent Link Discovery. Ph.D. thesis, Queensland University of Technology, 2018.

Tsunakawa, T. and Kaji, H. Towards cross-lingual patent wikification. Proceedings of 6th Workshp on Patent and Scientific Literature Translation (PSLT6) vol. 6, pp. 89, 2015.




How to Cite

Souza, C. M., Santos, M. E., & Meireles, M. R. G. (2019). Generating Links for Patent Documents: an Automatic Approach using Computational Intelligence. Journal of Information and Data Management, 10(3), 117 –.

