Skip to main content

A Power Law Semantic Similarity from Gene Ontology

  • Conference paper
  • First Online:
Advances in Bioinformatics and Computational Biology (BSB 2023)

Abstract

Currently, there is a massive data generation in the most diverse areas of knowledge, as bioinformatics that generates huge amounts of data, requiring the analysis and the summarization of this data for its understanding. Semantic similarity can be seen as an approach that considers the features of objects in a context in order to establish the similarity or dissimilarity of these objects. The Gene Ontology (GO) has been widely employed as a source of features in the estimation of semantic similarity between its terms. Several methods have been proposed in the literature for estimating semantic similarity from GO. However, the methods are based on parametric distributions or arbitrarily defined parameters that do not consider the distribution of GO data. In this context, this work presents a data-driven method for estimating the semantic similarity from GO terms that exploit the power-law distribution. A set of five metabolic pathways were considered for the evaluation of the proposed method and compared with some of the principal methods in the literature. The results showed the adequacy of the proposed method in the estimation of semantic similarities and that it produced more compact gene clusters among all the methods adopted and with an adequate distance between them, leading to clusters more assertive and less susceptible to errors. The proposed method is freely available at https://github.com/EricIto/plawss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akmal, S., Shih, L.H., Batres, R.: Ontology-based similarity for product information retrieval. Comput. Ind. 65(1), 91–107 (2014)

    Article  Google Scholar 

  2. Albert, R.: Scale-free networks in cell biology. J. Cell Sci. 118(21), 4947–4957 (2005)

    Article  CAS  PubMed  Google Scholar 

  3. Almaas, E., Barabási, A.L.: Power Laws in Biological Networks. Springer, Boston (2006). https://doi.org/10.1007/0-387-33916-7_1

    Book  Google Scholar 

  4. Barabási, A.L.: Scale-free networks: a decade and beyond. Science 325(5939), 412–413 (2009)

    Article  PubMed  Google Scholar 

  5. Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  PubMed  Google Scholar 

  6. Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  7. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.U.: Complex networks: structure and dynamics. Phys. Rep. 424(4–5), 175–308 (2006)

    Article  Google Scholar 

  8. Cao, R., Cheng, J.: Deciphering the association between gene function and spatial gene-gene interactions in 3d human genome conformation. BMC Genom. 16(1), 880 (2015)

    Article  Google Scholar 

  9. Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Cho, Y.R., Zhang, A., Xu, X.: Semantic similarity based feature extraction from microarray expression data. Int. J. Data Min. Bioinform. 3(3), 333–345 (2009)

    Article  PubMed  Google Scholar 

  11. Gene Ontology Consortium: Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45(D1), D331–D338 (2016)

    Google Scholar 

  12. Costa, L.F., Rodrigues, F.A., Travieso, G., Villas-Boas, P.R.: Characterization of complex networks: a survey of measurements. Adv. Phys. 56(1), 167–242 (2007)

    Article  Google Scholar 

  13. Evlampiev, K., Isambert, H.: Conservation and topology of protein interaction networks under duplication-divergence evolution. Proc. Natl. Acad. Sci. 105(29), 9863–9868 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  CAS  PubMed  Google Scholar 

  15. Garla, V.N., Brandt, C.: Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BMC Bioinform. 13(1), 261 (2012)

    Article  Google Scholar 

  16. He, X., Zhang, J.: Why do hubs tend to be essential in protein networks? PLOS Genet. 2(6), 1–9 (2006)

    Article  Google Scholar 

  17. Ito, E.A., Katahira, I., Vicente, F.F., Pereira, L.P., Lopes, F.M.: BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification. NAR 46(16), e96 (2018)

    Article  PubMed  PubMed Central  Google Scholar 

  18. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabási, A.L.: The large-scale organization of metabolic networks. Nature 407, 651–654 (2000)

    Article  CAS  PubMed  Google Scholar 

  19. Jiang, Y., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17(1), 184 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  20. Khanin, R., Wit, E.: How scale-free are biological networks. J. Comput. Biol. 13(3), 810–818 (2006)

    Article  CAS  PubMed  Google Scholar 

  21. de Lima, G.V.L., Castilho, T.R., Bugatti, P.H., Saito, P.T.M., Lopes, F.M.: A complex network-based approach to the analysis and classification of images. In: CIARP 2015. LNCS, vol. 9423, pp. 322–330. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25751-8_39

    Chapter  Google Scholar 

  22. Lin, D., et al.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)

    Google Scholar 

  23. Lopes, F.M., Martins Jr, D.C., Barrera, Jr., Cesar, Jr., Roberto M.: A feature selection technique for inference of graphs from their known topological properties: revealing scale-free gene regulatory networks. Inf. Sci. 272, 1–15 (2014)

    Google Scholar 

  24. Lopes, F.M., Martins, D.C., Barrera, J., Cesar, R.M.: SFFS-MR: a floating search strategy for GRNs inference. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 407–418. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16001-1_35

    Chapter  Google Scholar 

  25. Lorenz, D.M., Jeng, A., Deem, M.W.: The emergence of modularity in biological systems. Phys. Life Rev. 8(2), 129–160 (2011)

    PubMed  PubMed Central  Google Scholar 

  26. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)

    Article  Google Scholar 

  27. Pesquita, C.: Semantic similarity in the gene ontology. In: The Gene Ontology Handbook, pp. 161–173. Humana Press, New York, NY (2017)

    Google Scholar 

  28. Pesquita, C., Faria, D., Falcao, A.O., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  29. Pratt, J.W., Gibbons, J.D.: Kolmogorov-Smirnov two-sample tests. In: Pratt, J.W., Gibbons, J.D. (eds.) Concepts of Nonparametric Theory. Springer Series in Statistics, pp. 318–344. Springer, New York, NY (1981). https://doi.org/10.1007/978-1-4612-5931-2_7

  30. Ravasz, E.: Detecting Hierarchical Modularity in Biological Networks, pp. 145–160. Humana Press, Totowa, NJ (2009)

    Google Scholar 

  31. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)

    Article  Google Scholar 

  32. Serban, M.: Exploring modularity in biological networks. Philos. Trans. R. Soc. B 375(1796), 20190316 (2020)

    Article  Google Scholar 

  33. Shirai, S., et al.: Long-range temporal correlations in scale-free neuromorphic networks. Netw. Neurosci. 4(2), 432–447 (2020)

    Article  PubMed  PubMed Central  Google Scholar 

  34. Song, X., Li, L., Srimani, P.K., Yu, P.S., Wang, J.Z.: Measure the semantic similarity of go terms using aggregate information content. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(3), 468–476 (2014)

    Article  Google Scholar 

  35. da Rocha Vicente, F.F., Lopes, F.M.: SFFS-SW: a feature selection algorithm exploring the small-world properties of GNs. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds.) PRIB 2014. LNCS, vol. 8626, pp. 60–71. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09192-1_6

    Chapter  Google Scholar 

  36. Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.F.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10), 1274–1281 (2007)

    Article  CAS  PubMed  Google Scholar 

  37. Webb, A.R.: Statistical Pattern Recognition, 2nd edn. John Willey & Sons, New York (2002)

    Book  Google Scholar 

  38. Zhao, C., Wang, Z.: GOGO: an improved algorithm to measure the semantic similarity between gene ontology terms. Sci. Rep. 8(1), 1–10 (2018)

    Article  Google Scholar 

Download references

Acknowledgment

This study was funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (grant 440412/2022-6) and the Fundação Araucária and SETI (grant 138/2021 and NAPI - Bioinformática - grant PDI 66/2021).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabricio Martins Lopes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Augusto Ito, E., Rocha Vicente, F.F.d., Protasio Pereira, L.F., Lopes, F.M. (2023). A Power Law Semantic Similarity from Gene Ontology. In: Reis, M.S., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2023. Lecture Notes in Computer Science(), vol 13954. Springer, Cham. https://doi.org/10.1007/978-3-031-42715-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42715-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42714-5

  • Online ISBN: 978-3-031-42715-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics