Genetic Programming for Rule Generation Used in Protein Interaction Extraction from Texts
Abstract
In this work, a combination of syntax patterns used to extract protein-protein interactions from scientific text should be optimized. For this purpose, we present a system based on genetic programming (GP), an evolutionary algorithm that has symbolic expressions as individuals. GP allows the generation of new rules from a preliminary set of rules defined by an expert. The classification error obtained on a set of labeled examples is used as evaluation function. The training set used to evaluate the individuals is the BioCreAtIvE-PPI corpus, which contains textual information about interactions between proteins and /or genes.
References
FUNDEL, K.; KÜFFNER, R. & ZIMMER, R. (2007). “Relex relation extraction using dependency parse trees” Bioinformatics, 23(3): 365-371
HAKENBERG, J.; BICKEL, S.; PLAKE, C.; BREFELD, U.; ZAHN, H.; FAULSTICH, L.; LESER, U. & SCHEFFER, T. (2005). “Systematic feature evaluation for gene name recognition”, BMC Bioinformatics, 6(1): 1471-2105.
LEHNINGER, A. L.; NELSON, D. L. & COX, M. M. (2005). “Lehninger Principles Of Biochemistry”. New York: Freeman, 4th edition.
POLI, R.; LANGDON, W. B & MCPHEE, N. F. (2008). “A field guide to genetic programming”. Published via [link] and freely available at [link].
PLAKE, C.; HAKENBERG, J. & LESER, U. (2005). “Optimizing syntax patterns for protein-protein interactions”, In the Proc. of the 2005 ACM Symp. on Applied Computing, 195-201.
