Journal article

Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data

Thomas S Rask, Bent Petersen, Donald S Chen, Karen P Day, Anders Gorm Pedersen

BMC BIOINFORMATICS | BIOMED CENTRAL LTD | Published : 2016

Abstract

BACKGROUND: Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. RESULTS: The new basecalling method described here, named Multipass, implements a proba..

View full abstract

University of Melbourne Researchers

Grants

Awarded by National Institute of Allergy and Infectious Disease, National Institutes of Health


Awarded by Fogarty International Center at National Institutes of Health [Program on the Ecology and Evolution of Infectious Diseases]


Awarded by Lundbeck Foundation


Awarded by FOGARTY INTERNATIONAL CENTER


Awarded by NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES


Funding Acknowledgements

This work was supported by the National Institute of Allergy and Infectious Disease, National Institutes of Health [grant number R01-AI084156]; the Fogarty International Center at National Institutes of Health [Program on the Ecology and Evolution of Infectious Diseases, grant number R01-TW009670]; and the Lundbeck Foundation [grant number R48-A4847].