Conference Proceedings

Better Effectiveness Metrics for SERPs, Cards, and Rankings

Paul Thomas, Alistair Moffat, Peter Bailey, Falk Scholer, Nick Craswell

Proceedings of the 23rd Australasian Document Computing Symposium | Association for Computing Machinery | Published : 2018


Offline metrics for IR evaluation are often derived from a user model that seeks to capture the interaction between the user and the ranking, conflating the interaction with a ranking of documents with the user’s interaction with the search results page. A desirable property of any effectiveness metric is if the scores it generates over a set of rankings correlate well with the “satisfaction” or “goodness" scores attributed to those same rankings by a population of searchers. Using data from a large-scale web search engine, we find that offline effectiveness metrics do not correlate well with a behavioural measure of satisfaction that can be inferred from user activity logs. We then examine ..

View full abstract


Awarded by Australian Research Council

Funding Acknowledgements

We thank Bodo von Billerbeck and Alex Moore for their help wrangling data. This work was partially supported by the Australian Research Council (Project DP180102687).