Results (
Thai) 1:
[Copy]Copied!
The correspondence between orthography and pronunciationin Modern Standard Arabic (MSA) fallssomewhere between that of languages such as Spanishand Finnish, which have an almost one-to-onemapping between letters and sounds, and languagessuch as English and French, which exhibit a morecomplex letter-to-sound mapping (El-Imam, 2004).The more complex this mapping is, the more difficultthe language is for Automatic Speech Recognition(ASR).An essential component of an ASR system is itspronunciation dictionary (lexicon), which maps theorthographic representation of words to their phoneticor phonemic pronunciation variants. For languageswith complex letter-to-sound mappings, suchdictionaries are typically written by hand. However,for morphologically rich languages, such as MSA,1pronunciation dictionaries are difficult to create byhand, because of the large number of word forms,each of which has a large number of possible pronunciations.Fortunately, the relationship betweenorthography and pronunciation is relatively regularand well understood for MSA. Moreover, recentautomatic techniques for morphological analysisand disambiguation (MADA) can also be usefulin automating part of the dictionary creation process(Habash and Rambow, 2005; Habash and Rambow,2007) Nonetheless, most documented Arabic ASRsystems appear to handle only a subset of Arabicphonetic phenomena; very few use morphologicaldisambiguation tools.In Section 2, we briefly describe related work, includingthe baseline system we use. In Section 3, weoutline the linguistic phenomena we believe are criticalto improving MSA pronunciation dictionaries.In Section 4, we describe the pronunciation rules wehave developed based upon these linguistic phenomena.In Section 5, we describe how these rules areused, together with MADA, to build our pronunciationdictionaries for training and decoding automatically.In Section 6, we present results of our evaluationsof our phone- and word-recognition systems(XPR and XWR) onMSA compa
Being translated, please wait..
