Conference Proceedings

Estimating system effectiveness scoreswith incomplete evidence

SD Ravana, A Moffat

ADCS 2010 - Proceedings of the Fifteenth Australasian Document Computing Symposium | Published : 2010


It is common for only partial relevance judgments to be used when comparing retrieval system effectiveness, in order to control experimental cost. Using TREC data, we consider the uncertainty introduced into per-topic effectiveness scores by pooled judgments, and measure the effect that incomplete evidence has on both the systems scores that are generated, and also on the quality of paired system comparisons. We measure system behavior from three different points of view: the trend in effectiveness scores; the separability of system pairs; and the number of reversals in significance outcomes as the depth of judgments increases. Our results show that when shallow pooled judgments are used sys..

