How I create Ralf's Valencian dictionary:
1. Get spelling dictionary. License is GPL.
2. The encoding of valencian.dic and valencian.aff is ISO-8859-1.
3. Maybe I will use the language code of Catalan (and Catalan for the grapheme to phoneme conversion by eSpeak).
4. Convert valencian.dic to UTF-8:
ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ iconv -f ISO8859-1 -t UTF-8 < valencian.dic > valencian-utf8.dic
5. Convert valencian.aff to UTF-8:
ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ iconv -f ISO8859-1 -t UTF-8 < valencian.aff > valencian-utf8.aff
6. Change the line in valencian-utf8.aff that contain SET ISO8859-1 into SET UTF-8.
7. Generate Valencian word list:
ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ unmunch valencian-utf8.dic valencian-utf8.aff > valencian
The word list contains too many words: 3 million words is too much. A lot of words contain a hyphen. I could sort them out. Or I use valencian-utf8.dic as source.
8. Add <lexicon> at the beginning of the file valencian-utf8.dic; </lexicon> at the end of the file).
9. Generate XML file with <grapheme> elements:
ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ saxonb-xslt -s:valencian-utf8.dic -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:valencian.xml
10. Generate a first draft with <phoneme> elements:
ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ saxonb-xslt -s:valencian.xml -xsl:'http://spirit.blau.in/simon/files/2010/04/improve-estonian-dictionary.xsl' -o:valencian-pls.xml
11. Import Ralf's Valencian dictionary into simon.
12. Ralf's Valencian dictionary is just a first draft. If someone shows interest, I could generate the phonemes with eSpeak (voice: ca – Catalan).



