Automatic Multi-labeling of Health Web Pages: a preliminary evaluation of human perception

  • Fernando S. Sousa UNIFESP
  • Felipe Mancini IFSP
  • Fabio O. Teixeira UNIFESP
  • Gabriela D. de Araujo UNIFESP
  • Fátima de L. dos S. Nunes USP
  • Ivan T. Pisa UNIFESP

Abstract


Lay people show difficult when they look for health information on Web. This study evaluated the adequacy of automatic multi-label suggestion for health web pages in Brazilian Portuguese language. We collected 57 health web pages and asked 21 volunteers to evaluate them. We measured the recall, consensus between evaluators and consensus between evaluators and automatic classifiers. Recall reached 100%, with high consensus between evaluators to the 5 most relevant categories, suggesting that the automatic multi-labeling of health Web pages helps information retrieval by lay people.

References

Barbosa, A. F. [Ed.] (2011). Pesquisa Sobre o Uso das Tecnologias da Informação e da Comunicação no Brasil 2010. . Centro de Estudos sobre as Tecnologias da Informação e Comunicação (CETIC). [link], [accessed on Feb 14].

Breitman, K., Casanova, M. A. and Truszkowski, W. (2006). Semantic Web: Concepts, Technologies and Applications. 1. ed. Springer.

Fogg, B. J., Soohoo, C., Danielson, D. R., et al. (2003). How do users evaluate the credibility of Web sites? A study with over 2,500 participants. In Proceedings of the 2003 conference on Designing for user experiences. . ACM.

Fox, S. (2011). The Social Life of Health Information, 2011. . Pew Research Center’s Internet & American Life Project. [link].

Humphrey, S. M., Névéol, A., Browne, A., et al. (2009). Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty. Journal of the American Society for Information Science and Technology, v. 60, n. 12, p. 2530–2539.

John, G. and Langley, P. (1995). Estimating Continuous Distributions in Bayesian Classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, p. 338–345.

Keselman, A., Browne, A. C. and Kaufman, D. R. (2008). Consumer Health Information Seeking as Hypothesis Testing. J Am Med Inform Assoc, v. 15, n. 4, p. 484–495.

Kunder, M. De (2012). The size of the World Wide Web. [link], [accessed on Feb 14].

Mancini, F., Falcão, Alex Esteves Jaccoud, Hummel, A. D., et al. (2010). Brazilian health-related content web search portal development. In Proceedings of the 13th World Congress on Medical and Health Informatics (MEDINFO 2010). . IOS Press.

Mancini, F., Sousa, F. S., Teixeira, Fábio Oliveira, et al. (2010). Use of Medical Subject Headings (MeSH) in Portuguese for categorizing web-based healthcare content. Journal of Biomedical Informatics, v. 44, n. 2, p. 299–309.

Qi, X. G. and Davison, B. D. (2009). Web Page Classification: Features and Algorithms. Acm Computing Surveys, v. 41, n. 2.

Rosso, M. (2005). Using genre to improve web search. University of North Carolina.

Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, v. 24, p. 513–523.

Santini, M. (2008). Zero, single, or multi? Genre of web pages through the users’ perspective. Information Processing & Management, v. 44, n. 2, p. 702–737.

Schembri, G. and Schober, P. (2009). The Internet as a diagnostic aid: the patients’ perspective. Int J STD AIDS, v. 20, n. 4, p. 231–233.

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), v. 34, p. 1–47.

Sousa, F. S. (2011). Análise Comparativa de Métodos de Recuperação de Informação para Categorização de Conteúdos Web Relacionados à Saúde. Universidade Federal de São Paulo (UNIFESP).

Sousa, F. S., Mancini, F., Teixeira, F. de O., et al. (2012). Categorização automática de conteúdos web de saúde em português brasileiro com classificador bayesiano.

Journal of Health Informatics, Stvilia, B., Mon, L. and Yi, Y. (2009). A model for online consumer health information quality. Journal of the American Society for Information Science and Technology, v. 60, n. 9, p. 1781–1791.

Tang, H. and Ng, J. H. K. (10 nov 2006). Googling for a diagnosis--use of Google as a diagnostic aid: internet based study. BMJ, p. bmj.39003.640567.AE.

Toms, E. G. and Latter, C. (1 sep 2007). How consumers search for health information. Health Informatics Journal, v. 13, n. 3, p. 223–235.

Wolfram, D., Wang, P. and Zhang, J. (2009). Identifying Web search session patterns using cluster analysis: A comparison of three search environments. J. Am. Soc. Inf. Sci. Technol., v. 60, n. 5, p. 896–910.

Zeng, Q., Kogan, S., Ash, N., Greenes, R. A. and Boxwala, A. A. (2002). Characteristics of consumer terminology for health information retrieval. Methods of Information in Medicine, v. 41, n. 4, p. 289–298.
Published
2012-07-16
SOUSA, Fernando S.; MANCINI, Felipe; TEIXEIRA, Fabio O.; ARAUJO, Gabriela D. de; NUNES, Fátima de L. dos S.; PISA, Ivan T.. Automatic Multi-labeling of Health Web Pages: a preliminary evaluation of human perception. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 12. , 2012, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2012 . p. 121-129. ISSN 2763-8952.