Better Effectiveness Metrics for SERPs, Cards, and Rankings
Paul Thomas, Alistair Moffat, Peter Bailey, Falk Scholer, Nick Craswell
Proceedings of the 23rd Australasian Document Computing Symposium | Association for Computing Machinery | Published : 2018
Offline metrics for IR evaluation are often derived from a user model that seeks to capture the interaction between the user and the ranking, conflating the interaction with a ranking of documents with the user’s interaction with the search results page. A desirable property of any effectiveness metric is if the scores it generates over a set of rankings correlate well with the “satisfaction” or “goodness" scores attributed to those same rankings by a population of searchers. Using data from a large-scale web search engine, we find that offline effectiveness metrics do not correlate well with a behavioural measure of satisfaction that can be inferred from user activity logs. We then examine ..View full abstract
Awarded by Australian Research Council
We thank Bodo von Billerbeck and Alex Moore for their help wrangling data. This work was partially supported by the Australian Research Council (Project DP180102687).