Conference Proceedings

English to Persian transliteration

Sarvnaz Karimi, Andrew Turpin, Falk Scholer, F Crestani (ed.), P Ferragina (ed.), M Sanderson (ed.)

STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS | SPRINGER-VERLAG BERLIN | Published : 2006

Abstract

Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan. Transliteration of Persian to English - that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary - is an unstudied problem. In this paper we make three novel contributions. First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian. Second, we discuss the difficulties in establishing a corpus for studying transliteration. Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English..

View full abstract

University of Melbourne Researchers