Conference Proceedings

Truth inference at scale: A Bayesian model for adjudicating highly redundant crowd annotations

Y Li, BIP Rubinstein, T Cohn, Ling Liu, Ryen White

The World Wide Web Conference on - WWW '19 | ACM | Published : 2019


Crowd-sourcing is a cheap and popular means of creating training and evaluation datasets for machine learning, however it poses the problem of 'truth inference', as individual workers cannot be wholly trusted to provide reliable annotations. Research into models of annotation aggregation attempts to infer a latent 'true' annotation, which has been shown to improve the utility of crowd-sourced data. However, existing techniques beat simple baselines only in low redundancy settings, where the number of annotations per instance is low (≤ 3), or in situations where workers are unreliable and produce low quality annotations (e.g., through spamming, random, or adversarial behaviours.) As we show, ..

View full abstract


Awarded by Australian Research Council

Funding Acknowledgements

Thiswork was sponsored by the DefenseAdvanced Research Projects Agency Information Innovation Office (I2O) under the Low Resource Languages for Emergent Incidents (LORELEI) program issued by DARPA/I2O under Contract No. HR0011-15-C-0114. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Trevor Cohn and Benjamin Rubinstien were supported by the Australian Research Council, FT130101105 and DP150103710, respectively.