Conference Proceedings

Data sets for spoken conversational search

J Trippas, P Thomas

CEUR Workshop Proceedings | ACM | Published : 2019


There is increasing interest in spoken conversational search—multi-turn interactions with a search engine, spoken in natural language—but until recently there was little public data to support research. We describe our experiences building two data sets for spoken conversational search: the Microsoft Information-Seeking Conversation set (“MISC”) and the Spoken Conversational Search set (“SCSdata”). Each data set contains recordings of spoken interactions between two people collaborating on web search tasks, but relatively small differences in protocol have led to observably different data. We discuss some consequences of these differences, and describe attempts to reproduce analyses from one..

View full abstract