Indexação de artigos científicos de Informática em Saúde por meio da competição de técnicas de extração de características

  • Fabio Teixeira UNIFESP
  • Fernando S. Sousa UNIFESP
  • Gabriela Denise Araujo UNIFESP
  • Felipe Mancini IFSP
  • Luciano V. de Araujo USP
  • Ivan T. Pisa UNIFESP

Resumo


O objetivo deste estudo foi desenvolver um mecanismo automatizado para a indexação de artigos científicos sob o domínio interdisciplinar da Informática em Saúde. Contemplou a construção de uma base de dados com 10.800 títulos e resumos de artigos científicos distribuídos uniformemente entre os domínios da Informática em Saúde, Ciência da Computação e Saúde. A avaliação foi realizada por meio da medida de desempenho f-score0,5, que alcançou o valor de 66%. Embora os artigos, submetidos à tarefa de indexação, pertencerem a um escopo interdisciplinar, o método proposto foi capaz de caracterizá-los de acordo com sua área de interesse, com taxa de acerto satisfatória.

Referências

Altman, D. G. (1990). Practical Statistics for Medical Research. 1st ed ed. Chapman and Hall/CRC.

Baeza-Yates, R. and Ribeiro-Neto, B. (2011). Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition). 2. ed. in Text Operations: Addison-Wesley Professional.

Bauer, D. F. (1972). Constructing Confidence Sets Using Rank Statistics. Journal of the American Statistical Association, v. 67, n. 339, p. 687–690.

Bernstam, E. V., Smith, J. W. and Johnson, T. R. (Fevereiro 2010). What is biomedical informatics? Journal of Biomedical Informatics, v. 43, n. 1, p. 104–110.

Gehanno, J.-F., Rollin, L., Jean, T., et al. (apr 2009). Precision and Recall of Search Strategies for Identifying Studies on Return-To-Work in Medline. Journal of Occupational Rehabilitation, v. 19, n. 3, p. 223–230.

Knaup, P. and Dickhaus, H. (2009). Perspectives of medical informatics: advancing health care requires interdisciplinarity and interoperability. Special topic on the occasion of the 35th anniversary of the Heidelberg/Heilbronn curriculum of medical informatics. Methods of Information in Medicine, v. 48, n. 1, p. 1–3.

Lan, M., Tan, C.-L. and Low, H.-B. (2006). Proposing a new term weighting scheme for text categorization. In Proceedings of the 21st national conference on Artificial intelligence Volume 1. . AAAI Press. [link], [accessed on Apr 29].

Magdy, W. and Jones, G. (2010). PRES: a score metric for evaluating recall-oriented information retrieval applications. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval. . ACM. DOI: 10.1145/1835449.1835551, [accessed on May 5].

Mantas, J., Ammenwerth, E., Demiris, G., et al. (7 jan 2010). Recommendations of the International Medical Informatics Association (IMIA) on Education in Biomedical and Health Informatics. First Revision. Methods of Information in Medicine, v. 49, n. 2, p. 105–120.

Nigam, K. and McCallum (1998). A comparison of event models for Naive Bayes text classification.

Pacheco, E. J., Nohama, P. and Schulz, S. (2009). Mapping of Clinical Documentation to Ontology. In IX Workshop de Informática Médica.

Porter, M. F. (1997). An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc. p. 313–316.

Radlinski, F. and Craswell, N. (2010). Comparing the sensitivity of information retrieval metrics. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval. , SIGIR ’10. ACM.

Royston, J. (1982). An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples. Journal of the Royal Statistical Society. Series C (Applied Statistics), v. 31, n. 2.

Salton, G., Wong, A. and Yang, C. S. (nov 1975). A vector space model for automatic indexing. Communications of the ACM, v. 18, p. 613–620.

Salton, Gerard and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. INFORMATION PROCESSING AND MANAGEMENT, v. 24, p. 513–523.

Sohn, S., Kim, W., Comeau, D. C. and Wilbur, W. J. (aug 2008). Optimal training sets for Bayesian prediction of MeSH assignment. Journal of the American Medical Informatics Association: JAMIA, v. 15, n. 4, p. 546–553.

Spreckelsen, C., Deserno, T. and Spitzer, K. (2011). Visibility of medical informatics regarding bibliometric indices and databases. BMC Medical Informatics and Decision Making, v. 11, n. 1, p. 24.

Van Bemmel, J. H. (2008). Medical Informatics Is Interdisciplinary avant la Lettre. Methods of Information in Medicine,

Yang, Y. and Pedersen, J. (1997). A comparative study on feature selection in text categorization. In Proceedings of ICML-97, 14th International Conference on Machine Learning. . Morgan Kaufmann Publishers, San Francisco, US. [link], [accessed on Aug 20].

Zhang, W., Yoshida, T. and Tang, X. (mar 2011). A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications, v. 38, n. 3, p. 2758–2765.
Publicado
16/07/2012
TEIXEIRA, Fabio; SOUSA, Fernando S.; ARAUJO, Gabriela Denise; MANCINI, Felipe; ARAUJO, Luciano V. de; PISA, Ivan T.. Indexação de artigos científicos de Informática em Saúde por meio da competição de técnicas de extração de características. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 12. , 2012, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2012 . p. 82-91. ISSN 2763-8952.