Conference Proceedings

Modelling Tibetan Verbal Morphology

Timothy Baldwin, Qianji Di, Ekaterina Vylomova

Proceedings of the 17th Workshop of the Australasian Language Technology Association | Australasian Language Technology Association | Published : 2019

Abstract

The Tibetan language, despite being spoken by 8 million people, is a lowresource language in NLP terms, and research to develop NLP tools and resources for the language has only just begun. In this paper, we focus on Tibetan verbal morphology — which is known to be quite irregular — and introduce a novel dataset for Tibetan verbal paradigms, comprising 1,433 lemmas with corresponding inflected forms. This enables the largest-scale NLP investigation to date on Tibetan morphological reinflection, wherein we compare the performance of several state-of-the-art models for morphological reinflection, and conduct an extensive error analysis. We show that 84% of errors are due to the irregularity of..

View full abstract