Ralf’s Valencian speech model

Friday, May 18th, 2012

Some words about the creation of this speech model:

1. Download Ralf’s Valencian dictionary.
2. New scenario “Valencian”. Delete the old shadow dictionary. Import Ralf’s Valencian dictionary as shadow dictionary.
3. Add ten Valencian words to training:

ababol, carnificar, contrastada, desencovenat, disputant, encepada, improductible, imperfecta, senatori, tartana

4. Grammar Unknown. Commands: Dictation plugin. Actions > Synchronize. Actions > Activate. Dictate:

senatori carnificar contrastada desencovenat disputant encepada desencovenat senatori tartana

5. Export scenario and base model.
6. Get Ralf’s Valencian speech model.

Ralf’s Valencian dictionary

Tuesday, May 11th, 2010

How I create Ralf's Valencian dictionary:

1. Get spelling dictionary. License is GPL.

2. The encoding of valencian.dic and valencian.aff is ISO-8859-1.

3. Maybe I will use the language code of Catalan (and Catalan for the grapheme to phoneme conversion by eSpeak).

4. Convert valencian.dic to UTF-8:

ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ iconv -f ISO8859-1 -t UTF-8 < valencian.dic > valencian-utf8.dic

5. Convert valencian.aff to UTF-8:

ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ iconv -f ISO8859-1 -t UTF-8 < valencian.aff > valencian-utf8.aff

6. Change the line in valencian-utf8.aff that contain SET ISO8859-1 into SET UTF-8.

7. Generate Valencian word list:

ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ unmunch valencian-utf8.dic valencian-utf8.aff > valencian

The word list contains too many words: 3 million words is too much. A lot of words contain a hyphen. I could sort them out. Or I use valencian-utf8.dic as source.

8. Add <lexicon> at the beginning of the file valencian-utf8.dic; </lexicon> at the end of the file).

9. Generate XML file with <grapheme> elements:

ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ saxonb-xslt -s:valencian-utf8.dic -xsl:'' -o:valencian.xml

10. Generate a first draft with <phoneme> elements:

ubuntu@ubuntu-desktop:~/Documents/201005/valencian-dictionary$ saxonb-xslt -s:valencian.xml -xsl:'' -o:valencian-pls.xml

11. Import Ralf's Valencian dictionary into simon.

12. Ralf's Valencian dictionary is just a first draft. If someone shows interest, I could generate the phonemes with eSpeak (voice: ca – Catalan).