Conference Proceedings

MultiSpanQA: A Dataset for Multi-Span Question Answering

H Li, M Vasardani, M Tomko, T Baldwin

Naacl 2022 2022 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference | ASSOC COMPUTATIONAL LINGUISTICS-ACL | Published : 2022

Abstract

Most existing reading comprehension datasets focus on single-span answers, which can be extracted as a single contiguous span from a given text passage. Multi-span questions, i.e., questions whose answer is a series of multiple discontiguous spans in the text, are common in real life but are less studied. In this paper, we present MultiSpanQA, a new dataset that focuses on questions with multi-span answers. Raw questions and contexts are extracted from the Natural Questions (Kwiatkowski et al., 2019) dataset. After multi-span re-annotation, MultiSpanQA consists of over a total of 6,000 multi-span questions in the basic version, and over 19,000 examples with unanswerable questions, and questi..

View full abstract

University of Melbourne Researchers

Grants


Funding Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive reviews. This research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. This research was supported by Australian Research Council grant DP170100109.