Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing

Maria Barrett; Ana Valeria Gonzalez-Garduño; Lea Frermann; Anders Søgaard

Conference Proceedings

Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing

Maria Barrett, Ana Valeria Gonzalez-Garduño, Lea Frermann, Anders Søgaard

Association for Computational Linguistics | Published : 2018

DOI: 10.18653/v1/n18-1184

Abstract

When learning POS taggers and syntactic chunkers for low-resource languages, different resources may be available, and often all we have is a small tag dictionary, motivating type-constrained unsupervised induction. Even small dictionaries can improve the performance of unsupervised induction algorithms. This paper shows that performance can be further improved by including data that is readily available or can be easily obtained for most languages, i.e., eye-tracking, speech, or keystroke logs (or any combination thereof). We project information from all these data sources into shared spaces, in which the union of words is represented. For English unsupervised POS induction, the additional ..

View full abstract

University of Melbourne Researchers

Lea Frermann Author

Funding Acknowledgements

Thanks to Desmond Elliott for valuable ideas. This research was partially funded by the ERC Starting Grant LOWLANDS No. 313695, as well as by Trygfonden.

Citation metrics

10Dimensions