Ralf’s Lower Sorbian speech model

Wednesday, May 16th, 2012

Some words about the creation of this speech model:

1. Get the PLS dictionary.
2. Create a Simon scenario with the Name LowerSorbian.
3. Import Ralf's Lower Sorbian dictionary into Simon as shadow dictionary.
4. Select a few words that I want to train. For this speech model, I want to train just five words: kóštowaś, kóžkarka, kóžna, pódpažonej, pśizemski.
5. Add as Grammar the word “Unknown”.
6. Add the Dictation plugin.
7. Press Synchronize. Press Activate. Simon is recognizing just two words out of five:

kóštowaś kóžna kóžna kóžna kóštowaś kóžna

This speech model is really bad. But it is a speech model that shows the concept.

8. Download Ralf’s Lower Sorbian speech model.

Ralf’s Lower Sorbian dictionary

Friday, April 23rd, 2010

Here is how I create Ralf's Lower Sorbian dictionary:

1. Get spelling dictionary, license is GPLv2.

2. Ubuntu terminal:

am3msi@am3msi-desktop:~/Documents/201004/lower-sorbian-dictionary$ unmunch dsb_DE.dic dsb_DE.aff > lower-sorbian-wordlist

This command created 700.000 Lower Sorbian words. I will just use dsb_DE.dic as source instead with 75.000 words.

3. Adding <lexicon> tags to the file dsb_DE.dic (<lexicon> at the beginning of the file; </lexicon> at the end of the file).

4. Ubuntu terminal:

am3msi@am3msi-desktop:~/Documents/201004/lower-sorbian-dictionary$ saxonb-xslt -ext:on -s:dsb_DE.dic -xsl:'' -o:lower-sorbian.xml

5. Using this table for the development of grapheme-to-phoneme conversion in the style-sheet improve-lower-sorbian-dictionary.xsl. These are the conversion rules:


6. Generating PLS dictionary:

am3msi@am3msi-desktop:~/Documents/201004/lower-sorbian-dictionary$ saxonb-xslt -ext:on -s:lower-sorbian.xml -xsl:'improve-lower-sorbian-dictionary.xsl' -o:lower-sorbian-dictionary.xml

7. Download Ralf's Lower Sorbian dictionary, and import it into simon.

8. Maybe someone (teacher or student) from the Niedersorbisches Gymnasium Cottbus is interested in the further development of Ralf's Lower Sorbian dictionary? I think that this would be an interesting project.