Automatic Generation of Links in Patent Documents

  • C. M. Souza PUC-MG
  • M. E. Santos PUC-MG
  • M. R. G. Meireles PUC-MG

Resumo


Patents are organized into classification systems, which assist offices and users in the process of seeking and retrieving such documents. A wide variety of users use the patent systems and the information contained in these documents. In addition to office professionals, such as examiners and analysts, who determine whether the invention satifies the conditions required to be patented and summarize the content of the document, other users such as inventors, researchers, investors and business managers have a keen interest in understanding the content of patents.However, patents are complex legal documents with a significant number of technical and descriptive details, which makes it difficult to identify and analyze the information contained in these documents. An automatic link system associated with some of the terms found in the patents would provide quick access to the concepts contained in specific knowledge bases. This work presents partial results of a project whose objective is the automatic generation of links in patent documents. The experiments were conducted with four subgroups of the United States Patent and Trademark Office (USPTO), which uses the Cooperative Patent Classification (CPC) classification system. In a first step, since documents do not have keywords, meaningful terms were selected to be designated as link origins, using the algorithm X2.Once the link destinies were selected, in a later step, keywords with more than one meaning were disambiguated. It is expected, with the creation of automated links, to aid in the reading of patent texts, thus making it easier to access concepts related to the terms presented by the documents and to the understanding of the information disclosed by the inventors.

Palavras-chave: Disambiguation, Keywords Extraction, Link creation, Patents

Referências

Corrêa Jr, E. A., Lopes, A. A., and Amancio, D. R. Word sense disambiguation: A complex network approach. Information Sciences vol. 442–443, pp. 103–113, 2018.

Cucerzan, S. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). pp. 708–716, 2007.

Gardner, J. J. and Xiong, L. Automatic link detection: A sequence labeling approach. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. CIKM ’09. ACM, New York, NY, USA, pp. 1701–1704, 2009.

Jana, A., Mooriyath, S., Mukherjee, A., and Goyal, P. Wikim: metapaths based wikification of scientific abstracts. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, pp. 1–10, 2017.

Meireles, M. R. G., Ferraro, G., and Geva, S. Classification and information management for patent collections: a literature review and some research questions. Information Research 21 (1), 2016.

Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. CIKM ’07. ACM, pp. 233–242, 2007.

Ouellette, L. L. Who reads patents? Nature biotechnology 35 (5): 421–424, 2017.

Panchenko, A., Ruppert, E., Faralli, S., Ponzetto, S. P., and Biemann, C. Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 1. pp. 86–98, 2017.

Reginaldo, T. V., Lucindo, D. L. B., Meireles, M. R. G., Patrocínio Júnior, Z. K. G., and Almeida, P. E. M. A comparison of algorithms for the extraction of keywords in a patent database. Proceedings of the XXXVIII Iberian Latin-American Congress on Computational Methods in Engineering, 2017.
Publicado
22/10/2018
SOUZA, C. M.; SANTOS, M. E.; MEIRELES, M. R. G.. Automatic Generation of Links in Patent Documents. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 6. , 2018, São Paulo/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 65-72. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2018.27386.