Conference Proceedings
Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance
NG Marchant, BIP Rubinstein
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | ASSOC COMPUTING MACHINERY | Published : 2021
Abstract
Important tasks like record linkage and extreme classification demonstrate extreme class imbalance, with 1 minority instance to every 1 million or more majority instances. Obtaining a sufficient sample of all classes, even just to achieve statistically-significant evaluation, is so challenging that most current approaches yield poor estimates or incur impractical cost. Where importance sampling has been levied against this challenge, restrictive constraints are placed on performance metrics, estimates do not come with appropriate guarantees, or evaluations cannot adapt to incoming labels. This paper develops a framework for online evaluation based on adaptive importance sampling. Given a tar..
View full abstractRelated Projects (1)
Grants
Awarded by Australian Research Council
Funding Acknowledgements
N. Marchant acknowledges the support of an Australian Government Research Training Program Scholarship. B. Rubinstein acknowledges the support of Australian Research Council grant DP150103710.