Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot.
Frédéric Ehrler, Antoine Geissbühler, Antonio Jimeno, Patrick Ruch
BMC Bioinformatics | Published : 2005
BACKGROUND: In the context of the BioCreative competition, where training data were very sparse, we investigated two complementary tasks: 1) given a Swiss-Prot triplet, containing a protein, a GO (Gene Ontology) term and a relevant article, extraction of a short passage that justifies the GO category assignment; 2) given a Swiss-Prot pair, containing a protein and a relevant article, automatic assignment of a set of categories. METHODS: Sentence is the basic retrieval unit. Our classifier computes a distance between each sentence and the GO category provided with the Swiss-Prot entry. The Text Categorizer computes a distance between each GO term and the text of the article. Evaluations are r..View full abstract