A Power Law Semantic Similarity from Gene Ontology


Currently, there is a massive data generation in the most diverse areas of knowledge, as bioinformatics that generates huge amounts of data, requiring the analysis and the summarization of this data for its understanding. Semantic similarity can be seen as an approach that considers the features of objects in a context in order to establish the similarity or dissimilarity of these objects. The Gene Ontology (GO) has been widely employed as a source of features in the estimation of semantic similarity between its terms. Several methods have been proposed in the literature for estimating semantic similarity from GO. However, the methods are based on parametric distributions or arbitrarily defined parameters that do not consider the distribution of GO data. In this context, this work presents a data-driven method for estimating the semantic similarity from GO terms that exploit the power-law distribution. A set of five metabolic pathways were considered for the evaluation of the proposed method and compared with some of the principal methods in the literature. The results showed the adequacy of the proposed method in the estimation of semantic similarities and that it produced more compact gene clusters among all the methods adopted and with an adequate distance between them, leading to clusters more assertive and less susceptible to errors. The proposed method is freely available at https://github.com/EricIto/plawss.

Palavras-chave: Semantic similarity, Complex networks, Power-law, Bioinformatics, Pattern Recognition


