Conference Proceedings

Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation

Oliver Adams, Trevor Cohn, Graham Neubig, Hilaria Cruz, Steven Bird, Alexis Michaud, Sara Goggi (ed.), Helene Mazo (ed.)

LREC 2018 - 11th International Conference on Language Resources and Evaluation | European Language Resources Association | Published : 2019


Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes up to 50 minu..

View full abstract