Conference Proceedings

Statistical power in retrieval experimentation

W Webber, A Moffat, J Zobel

International Conference on Information and Knowledge Management, Proceedings | Published : 2008


The power of a statistical test specifies the sample size required to reliably detect a given true effect. In IR evaluation, the power corresponds to the number of topics that are likely to be sufficient to detect a certain degree of superiority of one system over another. To predict the power of a test, one must estimate the variability of the population being sampled from; here, of between-system score deltas. This paper demonstrates that basing such an estimation either on previous experience or on trial experiments leaves wide margins of error. Iteratively adding more topics to the test set until power is achieved is more efficient; however, we show that it leads to a bias in favour of f..

View full abstract

Citation metrics