Combining evidence, specificity, and proximity towards the normalization of Gene Ontology terms in text.
S Gaudan, A Jimeno Yepes, V Lee, D Rebholz-Schuhmann
EURASIP J Bioinform Syst Biol | Published : 2008
Structured information provided by manual annotation of proteins with Gene Ontology concepts represents a high-quality reliable data source for the research community. However, a limited scope of proteins is annotated due to the amount of human resources required to fully annotate each individual gene product from the literature. We introduce a novel method for automatic identification of GO terms in natural language text. The method takes into consideration several features: (1) the evidence for a GO term given by the words occurring in text, (2) the proximity between the words, and (3) the specificity of the GO terms based on their information content. The method has been evaluated on the ..View full abstract