Learning Biological Sequence Types Using the Literature
Mohamed Reda Bouadjenek, Karin Verspoor, Justin Zobel
Association for Computing Machinery | Published : 2017
We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used in search to filter out irrelevant sequences. However, the sequence type attribute is generally a non-mandatory free-text field, and thus it is subject to many errors including typos, mis-assignment, and nonassignment. In GenBank, this problem concerns roughly 18% of records, an alarming number that should worry the biocuration community. To address this problem of automatic sequence type classification, we propose the use of literature assoc..View full abstract
Related Projects (1)
Awarded by Australian Research Council
This work is supported by the Australian Research Council through a Discovery Project grant, DP150101550.