Towards Fair Supervised Dataset Distillation for Text Classification

X Han; A Shen; Y Li; L Frermann; T Baldwin; T Cohn

Conference Proceedings

Towards Fair Supervised Dataset Distillation for Text Classification

X Han, A Shen, Y Li, L Frermann, T Baldwin, T Cohn

Sustainlp 2022 3rd Workshop on Simple and Efficient Natural Language Processing Proceedings of the Workshop | Published : 2022

DOI: 10.18653/v1/2022.sustainlp-1.13

Abstract

With the growing prevalence of large-scale language models, their energy footprint and potential to learn and amplify historical biases are two pressing challenges. Dataset distillation (DD) - a method for reducing the dataset size by learning a small number of synthetic samples which encode the information in the original dataset - is a method for reducing the cost of model training, however its impact on fairness has not been studied. We investigate how DD impacts on group bias in the context of text classification tasks, with experiments over two data sets, concluding that vanilla DD preserves the bias of the dataset. We then show how existing debiasing methods can be combined with DD to ..

View full abstract

University of Melbourne Researchers

Lea Frermann Author

Tim Baldwin Author

Trevor Cohn Author

Related Projects (1)

FAIRNESS IN NATURAL LANGUAGE PROCESSING

Natural language processing (NLP) has achieved spectacular commercial successes in recent years, and has been deployed across an ever-increa..

Grants

Awarded by Australian Research Council

Citation metrics

1Scopus

2Dimensions