Journal article
Benchmarks for measurement of duplicate detection methods in nucleotide databases
Q Chen, J Zobel, K Verspoor
Database | OXFORD UNIV PRESS | Published : 2023
Abstract
Duplication of information in databases is a major data quality challenge. The presence of duplicates, implying either redundancy or inconsistency, can have a range of impacts on the quality of analyses that use the data. To provide a sound basis for research on this issue in databases of nucleotide sequences, we have developed new, large-scale validated collections of duplicates, which can be used to test the effectiveness of duplicate detection methods. Previous collections were either designed primarily to test efficiency, or contained only a limited number of duplicates of limited kinds. To date, duplicate detection methods have been evaluated on separate, inconsistent benchmarks, leadin..
View full abstractGrants
Awarded by Australian Research Council
Funding Acknowledgements
Qingyu Chen's work is supported by an International Research Scholarship from The University of Melbourne.The project receives funding from the Australian Research Council through a Discovery Project grant, DP150101550