Journal article

In silico serotyping of E. coli from short read data identifies limited novel O-loci but extensive diversity of O:H serotype combinations within and between pathogenic lineages

Danielle J Ingle, Mary Valcanis, Alex Kuzevski, Marija Tauschek, Michael Inouye, Tim Stinear, Myron M Levine, Roy M Robins-Browne, Kathryn E Holt



The lipopolysaccharide (O) and flagellar (H) surface antigens of Escherichia coli are targets for serotyping that have traditionally been used to identify pathogenic lineages. These surface antigens are important for the survival of E. coli within mammalian hosts. However, traditional serotyping has several limitations, and public health reference laboratories are increasingly moving towards whole genome sequencing (WGS) to characterize bacterial isolates. Here we present a method to rapidly and accurately serotype E. coli isolates from raw, short read WGS data. Our approach bypasses the need for de novo genome assembly by directly screening WGS reads against a curated database of alleles li..

View full abstract


Awarded by NHMRC of Australia (Australian Heart Foundation Career Development Fellowship)

Awarded by Bill & Melinda Gates Foundation

Awarded by Victorian Life Sciences Computation Initiative (VLSCI)

Funding Acknowledgements

This work was supported by the NHMRC of Australia [Project Grants 1043830 to K. E. H., 1009296 and 1067428 to R. R. B.; Fellowship 1061409 to K. E. H.; Fellowship 1061435 to M. I. (co-funded by the Australian Heart Foundation Career Development Fellowship)]; the Bill & Melinda Gates Foundation (Grant 38874 to M. M. L.); and the Victorian Life Sciences Computation Initiative (VLSCI) (Grant VR0082). We thank Gordon Dougan and the sequencing teams at the Wellcome Trust Sanger Institute for sequencing the EPEC isolate collection.