Desempenho de ferramentas genotípicas e stacking na predição de tropismo do subtipo C do HIV-1
Resumo
Diversas ferramentas desenvolvidas para classificar o tropismo do HIV-1 foram projetadas com base em cepas do subtipo B e, portanto, podem não apresentar desempenhos satisfatórios para outros subtipos. O presente estudo avaliou o desempenho de algoritmos genotípicos na predição do tropismo do HIV-1 subtipo C e aplicou a técnica de stacking a fim de buscar um modelo com melhor desempenho. A Regra de Raymond apresentou melhor desempenho geral, porém o Geno2Pheno 0,20 teve maior sensibilidade. O modelo proposto apresentou desempenho igual ao Geno2Pheno 0,10, com sensibilidade e especificidade maiores que 90%. A técnica de stacking pode ser útil para melhorar a predição do tropismo sem novos testes.
Referências
CASHIN, K.; GRAY, L. R.; HARVEY, K. L.; et al. Reliable Genotypic Tropism Tests for the Major HIV-1 Subtypes. Scientific Reports, v. 5, n. 1, p. 1–8, 2015.
CHARIF, D.; LOBRY, J. R. SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. In:
BASTOLLA, U.; PORTO, M.; ROMAN, H. E.; et al (Orgs.). Structural approaches to sequence evolution: Molecules, networks, populations. New York: Springer Verlag, 2007, p. 207–232. (Biological and Medical Physics, Biomedical Engineering).
CHAWLA, N. V.; BOWYER, K. W.; HALL, L. O.; et al. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, v. 16, p. 321–357, 2002.
CHIOU, S. H.; FREED, E. O.; PANGANIBAN, A. T.; et al. Studies on the role of the V3 loop in human immunodeficiency virus type 1 envelope glycoprotein function. AIDS research and human retroviruses, v. 8, n. 9, p. 1611–1618, 1992.
GHO - Global Health Observatory | By category | Number of people (all ages) living with HIV - Estimates by country. Disponível em: [ http://apps.who.int/gho/data/view.main.22100?lang=en ]. Acesso em: 25 out. 2019.
GRÄF, T.; PINTO, A. R. The increasing prevalence of HIV-1 subtype C in Southern Brazil and its dispersion through the continent. Virology, v. 435, n. 1, p. 170–178, 2013.
HEIDER, D.; DYBOWSKI, J. N.; WILMS, C.; et al. A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData Mining, v. 7, p. 14, 2014.
JENSEN, M. A.; COETZER, M.; VAN’T WOUT, A. B.; et al. A Reliable Phenotype Predictor for Human Immunodeficiency Virus Type 1 Subtype C Based on Envelope V3 Sequences. Journal of Virology, v. 80, n. 10, p. 4698–4704, 2006.
KAWASHIMA, S.; OGATA, H.; KANEHISA, M. AAindex: Amino Acid Index Database. Nucleic Acids Research, v. 27, n. 1, p. 368–369, 1999.
KUHN, M.; WING, J.; WESTON, S.; et al. caret: Classification and Regression Training. [s.l.: s.n.], 2019. Disponível em: [ https://CRAN.R-project.org/package=caret ]. Acesso em: 23 jun. 2020.
LENGAUER, T.; SANDER, O.; SIERRA, S.; et al. Bioinformatics prediction of HIV
coreceptor usage. Nature Biotechnology, v. 25, n. 12, p. 1407–1410, 2007.
Los Alamos. HIV Databases. Disponível em: [ https://www.hiv.lanl.gov/content/index]. Acesso em: 14 dez. 2019.
MASSO, M.; VAISMAN, I. I. Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage. BMC Bioinformatics, v. 11, p. 494, 2010.
MENEZES, R.; RAPOSO, L. HIV Tropism Ensemble Methods. 2020. Disponível em: [https://doi.org/10.5281/zenodo.3905343]. Acesso em: 23 jun. 2020.
OZA, N. C.; TUMER, K. Classifier ensembles: Select real-world applications. Information Fusion, v. 9, n. 1, p. 4–20, 2008.
PAGÈS, H.; ABOYOUN, P.; GENTLEMAN, R.; et al. Biostrings: Efficient manipulation of biological strings. [s.l.: s.n.], 2019.
POWERS, D. M. W. Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. Journal of Machine Learning Technologies, v. 2, n. 1, p. 37–63, 2011.
R CORE TEAM. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2019. Disponível em: [ https://www.R-project.org/ ]. Acesso em: 23 jun. 2020.
RAYMOND, S.; DELOBEL, P.; MAVIGNER, M.; et al. Correlation between genotypic predictions based on V3 sequences and phenotypic determination of HIV-1 tropism. AIDS (London, England), v. 22, n. 14, p. F11-16, 2008.
RIEMENSCHNEIDER, M.; CASHIN, K. Y.; BUDEUS, B.; et al. Genotypic Prediction of Co-receptor Tropism of HIV-1 Subtypes A and C. Scientific Reports, v. 6, n. 1, p. 1–9, 2016.
SWENSON, L. C.; DÄUMER, M.; PAREDES, R. Next-generation sequencing to assess HIV tropism. Current opinion in HIV and AIDS, v. 7, n. 5, p. 478–485, 2012.
TORGO, L. Data Mining with R, learning with case studies. [s.l.]: Chapman and Hall/CRC, 2010. Disponível em: [ http://www.dcc.fc.up.pt/ltorgo/DataMiningWithR ]. Acesso em: 23 jun. 2020.
WICKHAM, H. stringr: Simple, Consistent Wrappers for Common String Operations. [s.l.: s.n.], 2019. Disponível em: [h ttps://CRAN.R-project.org/package=stringr ]. Acesso em: 23 jun. 2020.