COCαDA - Large-Scale Protein Interatomic Contact Cutoff Optimization by Cα Distance Matrices
Resumo
Contacts, defined as interand intramolecular interactions predicted computationally, are typically detected using Euclidean distance and atom types. However, traditional methods can be computationally expensive and limit scalability. We introduce COCαDA (Contact Optimization by Cα Distance Analysis), a novel method that incorporates domain knowledge of amino acids to optimize distance cutoffs, simplifying implementation and enhancing efficiency. COCαDA outperforms traditional methods such as all-against-all, static cutoff (SC), and Biopython’s NeighborSearch (NS), averaging 2.5x faster than SC and 6x faster than NS. COCαDA is well-suited for exploratory and large-scale analyses and is freely available at https://github.com/LBS-UFMG/COCaDA.
Referências
Bickerton, G. R., Higueruelo, A. P., and Blundell, T. L. (2011). Comprehensive, atomic-level characterization of structurally characterized protein-protein interactions: the PICCOLO database. BMC Bioinformatics, 12(1):313.
Brown, S. D., Gerlt, J. A., Seffernick, J. L., and Babbitt, P. C. (2006). A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol., 7(1):R8.
Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., et al. (2009). Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–1423.
da Silveira, C. H., Pires, D. E. V., Minardi, R. C., Ribeiro, C., Veloso, C. J. M., Lopes, J. C. D., Meira, Jr, W., Neshich, G., Ramos, C. H. I., Habesch, R., and Santoro, M. M. (2009). Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins, 74(3):727–743.
de Melo, R. C., Lopes, C. E. R., Fernandes, Jr, F. A., da Silveira, C. H., Santoro, M. M., Carceroni, R. L., Meira, Jr, W., and Araújo, A. d. A. (2006). A contact map matching approach to protein structure similarity analysis. Genet. Mol. Res., 5(2):284–308.
Delaunay, B. (1934). Sur la sphère vide. À la mémoire de georges voronoı̈. Bulletin de l’Académie des Sciences de l’URSS. Classe des sciences mathématiques et naturelles, VII:793–800. Zbl 60.0946.06.
Ding, Z. and Kihara, D. (2018). Computational methods for predicting protein-protein interactions using various protein features. Curr. Protoc. Protein Sci., 93(1):e62.
Fassio, A. V., Santos, L. H., Silveira, S. A., Ferreira, R. S., and de Melo-Minardi, R. C. (2020). napoli: A graph-based strategy to detect and visualize conserved protein-ligand interactions in large-scale. IEEE/ACM Trans. Comp. Biol. Bioinf., 17(4):1317–1328.
Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., et al. (2020). Array programming with NumPy. Nature, 585(7825):357–362.
Jubb, H. C., Higueruelo, A. P., Ochoa-Montaño, B., Pitt, W. R., Ascher, D. B., and Blundell, T. L. (2017). Arpeggio: A web server for calculating and visualising interatomic interactions in protein structures. Journal of Molecular Biology, 429(3):365–371.
Kasahara, K. and Kinoshita, K. (2014). GIANT: pattern analysis of molecular interactions in 3D structures of protein-small ligand complexes. BMC Bioinformatics, 15(1):12.
Laskowski, R. A., Jabłońska, J., Pravda, L., Vařeková, R. S., and Thornton, J. M. (2018). PDBsum: Structural summaries of PDB entries. Protein Sci., 27(1):129–134.
Laskowski, R. A. and Swindells, M. B. (2011). LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J. Chem. Inf. Model., 51(10):2778–2786.
Lee, S. and Blundell, T. L. (2009). BIPA: a database for protein-nucleic acid interaction in 3D structures. Bioinformatics, 25(12):1559–1560.
Mancini, A. L., Higa, R. H., Oliveira, A., Dominiquini, F., Kuser, P. R., Yamagishi, M. E. B., Togawa, R. C., and Neshich, G. (2004). Sting contacts: a web-based application for identification and analysis of amino acid contacts within protein structure and across protein interfaces. Bioinformatics, 20(13):2145–2147.
Nelson, D. L. and Cox, M. M. (2008). Lehninger principles of biochemistry. W.H. Freeman, New York, NY, 5 edition.
Pimentel, V., Mariano, D., Cantão, L. X. S., Bastos, L. L., Fischer, P., de Lima, L. H. F., Fassio, A. V., and de Melo-Minardi, R. C. (2021). VTR: A web tool for identifying analogous contacts on protein structures and their complexes. F. Bioinf., 1:730350.
Pires, D. E. V., de Melo-Minardi, R. C., dos Santos, M. A., da Silveira, C. H., Santoro, M. M., and Meira, Jr, W. (2011). Cutoff scanning matrix: structural classification and function prediction by protein inter-residue distance patterns. BMC Gen., 12(S4):S12.
Schreyer, A. M. and Blundell, T. L. (2013). CREDO: a structural interactomics database for drug discovery. Database (Oxford), 2013:bat049.
Smetana, J. H. C. and Misra, G. (2017). Principles of protein structure and function. In Intro. to Biomol. Struct. and Biophys., pages 1–32. Springer, Singapore.
Sobieraj, M. and Setny, P. (2021). Entropy-based distance cutoff for protein internal contact networks. Proteins, 89(10):1333–1339.
Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E., and Edelman, M. (1999). Automated analysis of interatomic contacts in proteins. Bioinformatics, 15(4):327–332.
Veloso, C. J. M., Silveira, C. H., Melo, R. C., Ribeiro, C., Lopes, J. C. D., Santoro, M. M., and Meira, Jr, W. (2007). On the characterization of energy networks of proteins. Genet. Mol. Res., 6(4):799–820.
Voronoi, G. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques. deuxième mémoire. recherches sur les parallélloèdres primitifs. Journal für die reine und angewandte Mathematik, 134:198–287.
Wallace, A. C., Laskowski, R. A., and Thornton, J. M. (1995). LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Prot. Eng., 8:127–134.