Journal article

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Laurent Jacob, Johann A Gagnon-Bartsch, Terence P Speed



When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the la..

View full abstract

University of Melbourne Researchers


Awarded by Australian National Health and Medical Research Council Program

Funding Acknowledgements

This work was funded by the SU2C-AACR-DT0409 grant. Funding to pay the Open Access publication charges for this article was provided by Australian National Health and Medical Research Council Program Grant APP1054618.