Literature consistency of bioinformatics sequence databases is effective for assessing record quality
Mohamed Reda Bouadjenek, Karin Verspoor, Justin Zobel
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | OXFORD UNIV PRESS | Published : 2017
Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inco..View full abstract
Related Projects (1)
Awarded by Australian Research Council
The project receives funding from the Australian Research Council through a Discovery Project grant, DP150101550.