This article explains some details about the creation of the dictionary, and how the result looks like in simon.
A. How I create Ralf's Yiddish dictionary:
1. Get spelling dictionary.
2. License is GPLv3.
3. Extract jidysz.net.ooo.spellchecker.oxt.
4. Ubuntu terminal:
cd /home/ubuntu/Documents/2011-II/Yiddish/dictionaries
sudo apt-get install hunspell-tools
unmunch yi.dic yi.aff > yiddish-wordlist
5. Add <lexicon> at the beginning of yiddish-wordlist. Add </lexicon> at the end of this file.
6. Generate .xml document with lexicon, lexeme and grapheme elements:
ubuntu@ubuntu:~/Documents/2011-II/Yiddish/dictionaries$ saxonb-xslt -s:yiddish-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:yiddish.xml
7. ISO 639-1 language code is yi.
8. I think I will use this table as source for the grapheme to phoneme mapping.
9. Ubuntu terminal:
ubuntu@ubuntu:~/Documents/2011-II/Yiddish/dictionaries$ saxonb-xslt -s:yiddish.xml -xsl:'improve-yiddish.xsl' -o:yiddish-dictionary.xml
B. Download the dictionary, and import it into simon.
Take a look at the result. The left column contains the Yiddish words. This dictionary contains 99980 words. The right column contains the corresponding SAMPA transcription.
Yiddish is written in the Hebrew alphabet. The Hebrew alphabet is written from right to left. Obviously, the corresponding SAMPA transcriptions are written from left to right. This means that the phoneme order should be fine.
There are a lot of other PLS dictionaries available. Find the PLS dictionary that suits your language.
Tags: PLS, saxonb-xslt, unmunch, yiddish