Journal article

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Justin M Zook, Brad Chapman, Jason Wang, David Mittelman, Oliver Hofmann, Winston Hide, Marc Salit

NATURE BIOTECHNOLOGY | NATURE PUBLISHING GROUP | Published : 2014

Abstract

Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mapp..

View full abstract

University of Melbourne Researchers

Grants

Funding Acknowledgements

We thank J. Johnson and A. Varadarajan from the Archon Genomics X Prize and EdgeBio for contributing their whole-genome sequencing data from SOLiD and Illumina, Complete Genomics and Life Technologies for providing bam files for NAl2878, and the Broad Institute and 1000 Genomes Project for making publicly available barn and VCF files for NAl2878. The Illumina exome data on GCAT were given to the Mittelman laboratory by M. Linderman at Icahn Institute of Genomics and Multiscale Biology of the Icahn School of Medicine at Mount Sinai. We thank the US Food and Drug Administration High Performance Computing staff for their support in running the bioinformatics analyses. Harvard School of Public Health contributions were funded by the Archon Genomics X PRIZE. Certain commercial equipment, instruments or materials are identified in this document. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products identified are necessarily the best available for the purpose.