Search Effectiveness in Nonredundant Sequence Databases: Assessments and Solutions.

Qingyu Chen; Xiuzhen Zhang; Yu Wan; Justin Zobel; Karin Verspoor

Journal article

Search Effectiveness in Nonredundant Sequence Databases: Assessments and Solutions.

Qingyu Chen, Xiuzhen Zhang, Yu Wan, Justin Zobel, Karin Verspoor

Journal of Computational Biology | Mary Ann Liebert | Published : 2018

DOI: 10.1089/cmb.2018.0198

Abstract

Duplicate sequence records-that is, records having similar or identical sequences-are a challenge in search of biological sequence databases. They significantly increase database search time and can lead to uninformative search results containing similar sequences. Sequence clustering methods have been used to address this issue to group similar sequences into clusters. These clusters form a nonredundant database consisting of representatives (one record per cluster) and members (the remaining records in a cluster). In this approach, for nonredundant database search, users search against representatives first and optionally expand search results by exploring member records from matching clus..

View full abstract

University of Melbourne Researchers

Justin Zobel Author

Related Projects (1)

Natural language processing for automated validation of protein databases

The project aims to use natural language processing and information retrieval to reconcile and improve sources of biological information. Bi..

Grants

Awarded by Australian Research Council

Funding Acknowledgements

We appreciate the advice of the NCBI BLAST team on BLAST-related commands and parameters. The work of Q.C. is supported by the Melbourne International Research Scholarship from the University of Melbourne. The project receives funding from the Australian Research Council through a Discovery Project grant, DP150101550.