This article explains how I create the dictionary, and how the imported result looks like in simon.
A. Creation of the PLS dictionary:
1. Get spelling dictionary.
2. License is GPL. It says in the file README_en.txt:
This spell check dictionary for Interlingua is licensed under GPL. [...] This hyphenation rules for Interlingua are licensed under GPL.
This means that I can use this spelling dictionary as source.
3. Extract dict-ia-2010-11-29.oxt.
4. ISO 639-1 language code is ia.
5. Probably I will use this table for grapheme to phoneme conversion.
6. Check the encoding of ia_iso.aff and ia_iso.dic. Both files are encoded in ISO 8859-1. Probably it is best if I convert the encoding of both files into UTF-8.
iconv -f ISO-8859-1 -t UTF-8 < ia_iso.dic > interlingua-utf8.dic
iconv -f ISO-8859-1 -t UTF-8 < ia_iso.aff > interlingua-utf8.aff
Change the first line in interlingua-utf8.aff into SET UTF-8. Both files contain CRLF at the end of each line (Windows mode). I don’t know whether this is ok with the unmunch command. I will check it out:
ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ unmunch interlingua-utf8.dic interlingua-utf8.aff > interlingua-wordlist
Obviously, it worked. The CRLF is part of the source files. The target file contains just a LF (Unix mode). There are a lot of duplicate entries. I think that these duplicate entries will be removed later by an .xsl script.
7. Add lexicon tags at the beginning and the end of interlingua-wordlist.
8. Create XML file:
ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ saxonb-xslt -s:interlingua-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:interlingua.xml
9. Create PLS dictionary:
ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ saxonb-xslt -s:interlingua.xml -xsl:'improve-interlingua.xsl' -o:interlingua-dictionary.xml
B. Download the dictionary. Import it into simon.
The left column contains the words. The pronunciation column contains the corresponding SAMPA transcriptions. The Category column contains just “Unknown” entries.
Now you know how I created the dictionary and how the result looks like in simon.







