Journal article

No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data

Aaron TL Lun, Gordon K Smyth

STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY | WALTER DE GRUYTER GMBH | Published : 2017

Abstract

RNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in undere..

View full abstract

University of Melbourne Researchers

Grants

Awarded by National Health and Medical Research Council


Funding Acknowledgements

National Health and Medical Research Council (Program Grant 1054618 to G.K.S., Fellowship 1058892 to G.K.S.); Victorian State Government Operational Infrastructure Support; Department of Education and Training, Australian Government NHMRC IRIIS.