Conference Proceedings

Learning Biological Sequence Types Using the Literature

Mohamed Reda Bouadjenek, Karin Verspoor, Justin Zobel

Association for Computing Machinery | Published : 2017


We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used in search to filter out irrelevant sequences. However, the sequence type attribute is generally a non-mandatory free-text field, and thus it is subject to many errors including typos, mis-assignment, and nonassignment. In GenBank, this problem concerns roughly 18% of records, an alarming number that should worry the biocuration community. To address this problem of automatic sequence type classification, we propose the use of literature assoc..

View full abstract