Ralf’s German speech model 0.1.9.4

June 4th, 2012 by producer

This article explains how I am creating version 0.1.9.4 of Ralf’s German speech model. This speech model should contain about 300.000 words. Let’s see whether it works out. Here is what I do:

1. I have imported all German IPA FLAC files into simon (more than 50.000 FLAC files have been imported). The speech model is working with about 50.000 words.
2. Import a reduced version of my German PLS dictionary from here: file:///home/linuxmint/Music/preparing-de-0.1.9.4/reduced-german-dictionary-0.2.8.1.xml
Simon > File > Connect. Simon has now automatically been activated. Deactivate Simon. Synchronize.
It will be necessary to remove words from the dictionary that contain triphones that are not part of the acoustic model.
3. Linux Mint terminal:

cd /home/linuxmint/Music/preparing-de-0.1.9.4
saxonb-xslt -ext:on -s:words-not-found-1 -xsl:analyze.xsl -o:words-not-found-2
saxonb-xslt -ext:on -s:reduced-german-dictionary-0.2.8.1.xml -xsl:compare-missing-graphemes.xsl

4. Delete the scenario demega.
5. Import the scenario demega. Import the base model demega as static base model.
6. Synchonize. Activate Simon and dicate a few words. It is working.
7. Import reduced-german-dictionary-0.2.8.1-1.xml.
8. Disconnect. Connect. Synchronize. Wait a few moments. Activate. An error message occurs. There are a lot of words that consist of sounds that are not covered by the base model.
9. Terminal:

saxonb-xslt -ext:on -s:words-not-found-3 -xsl:analyze.xsl -o:words-not-found-4
saxonb-xslt -ext:on -s:reduced-german-dictionary-0.2.8.1-1.xml -xsl:compare-missing-graphemes.xsl

I won’t explain the next steps because they are just a repetition of the previous steps.

Now you got an impression how the next version of my speech model was created. The speech model contains 290.000 words.

Will this speech model run on your computer? You have to compile Julius from source:

./configure --enable-words-int

Normally, Julius has a limit of 65.535 words. I don’t know what the actual limit is, but 290.000 words is possible when compiling with this option.

Ralf’s Arabic speech model

May 18th, 2012 by producer

Some words about the creation of this speech model:

1. Download Ralf’s Arabic dictionary.
2. Create a scenario “Arabic”. Clear the shadow vocabulary. Import the dictionary as shadow dictionary.
3. Select 8 Arabic words for training:

اخترا, اختزال, اختزان, جواب, جوائر, نمحى, نمسان, وسوست

4. Grammar Unknown. Dictation plugin. Synchronize. Activate. When I dictate Arabic words, Simon recognizes them. But there is no output. Even when I switch the keyboard language, there is no result.
5. Get Ralf’s Arabic speech model.

Ralf’s Vietnamese speech model

May 18th, 2012 by producer

Some words about the creation of this speech model:

1. Download Ralf’s Vietnamese dictionary 0.1.1.
2. New “Vietnamese” scenario. Import the dictionary as shadow dictionary.
3. Train 10 words:

ang, chùng, cuốn, gắm, hộp, khem, lếch, ngỗng, thoang, vung

4. Grammar Unknown. Dictation plugin. Synchronize. Activate. Dictate:

ang vung vung gm hp gm

There are specific letters missing. I tried to fix that by changing the keyboard layout. But it didn’t help.

5. Get Ralf’s Vietnamese speech model.

̀

Ralf’s Valencian speech model

May 18th, 2012 by producer

Some words about the creation of this speech model:

1. Download Ralf’s Valencian dictionary.
2. New scenario “Valencian”. Delete the old shadow dictionary. Import Ralf’s Valencian dictionary as shadow dictionary.
3. Add ten Valencian words to training:

ababol, carnificar, contrastada, desencovenat, disputant, encepada, improductible, imperfecta, senatori, tartana

4. Grammar Unknown. Commands: Dictation plugin. Actions > Synchronize. Actions > Activate. Dictate:

senatori carnificar contrastada desencovenat disputant encepada desencovenat senatori tartana

5. Export scenario and base model.
6. Get Ralf’s Valencian speech model.

Ralf’s Tagalog speech model

May 17th, 2012 by producer

Some words about the creation of this speech model.

1. Download Ralf’s Tagalog dictionary.
2. Create a Talalog scenario. Delete the old shadow vocabulary.
3. Train ten Tagalog words:

alpabeto, balakang, dobleng, kababayan, mababata, makakita, naunang, palaso, payagang, tumatanda

4. Add grammar Unknown. Add dictation plugin.
5. Actions > Synchronize. Actions > Activate. The recognition result is bad:

balakang balakang mababata balakang balakang payagang balakang mababata

6. Get Ralf’s Tagalog speech model.

Ralf’s Swedish speech model

May 17th, 2012 by producer

Some words about this speech model:

1. Download Ralf’s Swedish dictionary. Create a scenario. Clear the shadow vocabulary. Import the dictionary as shadow dictionary.
2. Select ten Swedish words for training: anfalla, geologi, gestalta, getskinnets, inknådad, nämnda, tiotal, tiokamps, tingshus, tingat
3. Grammar: Unknown. Dictation plugin. Synchronize. Activate. Dictate: anfalla geologi gestalta nämnda inknådad nämnda nämnda tingshus tiokamps tiotal
4. Export scenario and base model.
5. Get Ralf’s Swedish speech model.

Ralf’s Swahili speech model

May 17th, 2012 by producer

Some words about the creation of this speech model:

1. Get Ralf’s Swahili dictionary. Create a Simon scenario “Swahili”. Clear shadow vocabulary. Import the dictionary as shadow dictionary.
2. Train ten Swahili words: aridhiana, kutompa, kutonesha, kurene, makuzi, tunaleta, tunalo, wamtupe, zikikata, zikolee
3. Grammar: Unknown. Add Dictation plugin. Actions > Synchronize. Actions > Activate. Dictate a few words:

aridhiana kurene kutompa kutonesha makuzi tunalo makuzi zikikata zikolee

4. Export scenario. Export base model.
5. Download Ralf’s Swahili speech model.

Ralf’s Spanish speech model

May 17th, 2012 by producer

Some words about the creation of this speech model.

1. Get Ralf’s Spanish dictionary.
2. Create a new Spanish scenario.
3. Import Ralf’s Spanish dictionary as shadow dictionary.
4. Train ten words: ababas, activárselos, actuabais, desconté, domingo, envacas, incapacitando, nabla, superviene, vacilas
5. Add Unknown as grammar. Add Dictation plugin. Actions > Synchronize. Actions > Activate.
6. Dictate a few words:

ababas activárselos actuabais actuabais vacilas actuabais superviene ababas incapacitando

7. Export scenario. Export base model.
8. Download Ralf’s Spanish speech model.

Ralf’s Slovenian speech model

May 17th, 2012 by producer

Some words about the creation of this speech model.

1. Get Ralf’s Slovenian dictionary.
2. Create a Simon scenario with the name “Slovenian”.
3. Remove the shadow vocabulary.
4. Import Ralf’s Slovenian dictionary as shadow dictionary.
5. Add ten words to training. Question – Simon:

Your vocabulary does not define all words used in this text. These words are missing:
encijane, encijanovo, enciklika, imenovale, kuretensko, nepozaben, plavolase, Zule, šiponovi, zlomom

Do you want to add them now?

Press the Yes button.

6. Add grammar Unknown. Add Dictation plugin. Actions > Synchronize. Actions > Activate. Dictate a few words:

encijane encijanovo enciklika imenovale šiponovi nepozaben plavolase šiponovi plavolase

7. Export scenario. Export base model.
8. Download Ralf’s Slovenian speech model.

Ralf’s Romanian speech model

May 17th, 2012 by producer

Some words about the creation of this speech model:

1. Get Ralf’s Romanian dictionary 0.1.1.
2. Create a Simon scenario with the name “Romanian”.
3. Delete the shadow vocabulary.
4. Import Ralf’s Romanian dictionary as shadow dictionary (PLS format).
5. Add ten words to training. Press the Train selected words button. Simon asks:

Your vocabulary does not define all words used in this text. These words are missing:
multicoloara, delapidată, delebil, delectat, diferim, dificilă, diftong, slab, văcar, împânzit

Do you want to add them now?

Press the Yes button.

6. Add grammar “Unknown”. Add dictation plugin.
7. Actions > Synchronize. Actions > Activate. Dictate a few words:

delectat delectat delectat dificil diftong vcar delectat vcar delectat

Not all Romanian letters appear. Some are being omitted.

8. I have to switch the keyboard language. This can be done in Linux Mint (Gnome Classic layout): Linux Mint > System Settings > Keyboard Layout.

9. Unfortunately, it doesn’t help to switch the keyboard layout. The dictation result is still the same.

10. Export the Romanian scenario. Export the Romanian base model.
11. Download Ralf’s Romanian speech model.

Ralf’s Portuguese (European) speech model

May 17th, 2012 by producer

Some words about the creation of this speech model.

1. Get Ralf's Portuguese (European) dictionary.
2. Create a Simon scenario with the name Portuguese.
3. Clear the shadow vocabulary.
4. Import Ralf's Portuguese (European) dictionary as shadow dictionary.
5. Add ten words to training. Simon asks:

Your vocabulary does not define all words used in this text. These words are missing:
comprar, comprazemos, comprometedor, feito, felino, fenomenal, incitativo, irresoluto, masculino, telescopia

Do you want to add them now?

Press the Yes button.

6. Add as grammar the word “Unknown”. Add dictation plugin.
7. Actions > Synchronize. Actions > Activate. Dictate a few words:

comprar comprazemos comprometedor feito felino comprar irresoluto masculino telescopia

8. Export the Portuguese scenario. Export the Portuguese base model.
9. Download Ralf's Portuguese (European) speech model.

Tip: If you are from Brazil, check this out.

Ralf’s Polish speech model

May 17th, 2012 by producer

Some words about the creation of this speech model:

1. Get Ralf’s Polish dictionary.
2. Create a Polish scenario.
3. Delete the shadow vocabulary from my previous scenario.
4. Import Ralf’s Polish dictionary as shadow dictionary.
5. Add 10 words to training. Simon asks:

Your vocabulary does not define all words used in this text. These words are missing:
Knopik, knorra, pitbul, cedziny, ceglasto, celebracja, celebra, cella, celoteks, frencz

Do you want to add them now?

Press the Yes button.

6. Record the ten words with Simon.
7. Actions > Synchronize. Actions > Activate. It doesn’t work. Why not? I forgot to add as Grammar the terminal “Unknown“. And I forgot to add the Dictation plugin.
8. Let’s dictate a few Polish words:

celebra cella cella frencz Knopik knorra pitbul cella celebra

9. Download Ralf’s Polish speech model.

Ralf’s Norwegian Bokmål speech model

May 17th, 2012 by producer

Some words about the creation of this speech model:

1. Get Ralf’s Norwegian Bokmål dictionary 0.1.1.
2. Create a Simon scenario with the name “Norwegian”.
3. Clear the shadow vocabulary which contains the shadow dictionary from my previous scenario.
4. Import Ralf's Norwegian Bokmål dictionary as shadow dictionary.
5. Train 10 Norwegian words. Simon is asking:

Your vocabulary does not define all words used in this text. These words are missing:
fiskehank, fiskehode, gangvei, gante, hodestup, kampvåpna, kyte, mien, sjako, sylfe

Do you want to add them now?

Press the Yes button.

6. Add as grammar the word “Unknown”. Add the dictation plugin. Actions > Synchronize. Actions > Activate. Let’s dictate a few words:

fiskehank kyte gangvei kyte sjako mien sjako sylfe

7. Export the Norwegian scenario. Export Norwegian base model.
8. Download Ralf’s Norwegian Bokmål speech model.

Ralf’s Northern Sotho speech model

May 17th, 2012 by producer

Some words about the creation of this speech model:

1. Download Ralf’s Northern Sotho dictionary.
2. Create a Simon scenario with the name NorthernSotho.
3. Import the dictionary as shadow dictionary.
4. I want to train five words.
5. Add as grammar the word “Unknown”. Add the dictation plugin.
6. Actions > Synchronize. Actions > Activate. Here are my recognition results:

dikati ditofo Guest kobokela Guest Guest

Simon recognized Guest instead of Lenhard. Never mind.

7. Download Ralf’s Northern Sotho speech model.

Ralf’s Macedonian speech model

May 17th, 2012 by producer

Some words about the creation of this speech model:

1. Download Ralf’s Macedonian dictionary.
2. Create a Simon scenario with the name Macedonian.
3. Import Ralf’s Macedonian dictionary as shadow dictionary.
4. I want to train ten words. Simon asks:

Your vocabulary does not define all words used in this text. These words are missing:
босите, босово, ботаничко, ботарева, мусадин, негата, негативата, предните, рипало, рипнува

Do you want to add them now?

Press the Yes button.

5. Define as grammar “Unknown”.
6. Add the Dictation plugin.
7. Press Synchronize. Press Activate. Simon just recognizes the space bar (because I have configured the dictiation plugin to add a space bar after each recorded word). Perhaps I should change the keyboard layout. I just tried that. Unfortunately, it doesn’t solve my problem. But I know that Simon is recognizing the words. There is just a problem with the output.

8. Download Ralf’s Macedonian speech model.

Ralf’s Lower Sorbian speech model

May 16th, 2012 by producer

Some words about the creation of this speech model:

1. Get the PLS dictionary.
2. Create a Simon scenario with the Name LowerSorbian.
3. Import Ralf's Lower Sorbian dictionary into Simon as shadow dictionary.
4. Select a few words that I want to train. For this speech model, I want to train just five words: kóštowaś, kóžkarka, kóžna, pódpažonej, pśizemski.
5. Add as Grammar the word “Unknown”.
6. Add the Dictation plugin.
7. Press Synchronize. Press Activate. Simon is recognizing just two words out of five:

kóštowaś kóžna kóžna kóžna kóštowaś kóžna

This speech model is really bad. But it is a speech model that shows the concept.

8. Download Ralf’s Lower Sorbian speech model.

Ralf’s Latvian speech model

May 16th, 2012 by producer

Some words about the creation of Ralf’s Latvian speech model:

1. Get Ralf’s Latvian dictionary.
2. Create a Latvian scenario.
3. Import the dictionary as shadow dictionary into simon.

4. Now I want to train 10 Latvian words. Press the button Train selected words.

5. Simon is asking now:

Your vocabulary does not define all words used in this text. These words are missing:
jâpârkurc, Lauciene, laukumains, olimpiete, olimpiâde, piekliegt, satriecâs, uzpletâm, þûpîba, þûþas

Do you want to add them now?

Press the Yes button.

Read the rest of this entry »

Ralf’s Hebrew speech model

May 16th, 2012 by producer

Some words about the creation of Ralf’s Hebrew speech model:

1. Add a new Hebrew Scenario.

2. Import Ralf’s Hebrew dictionary as shadow dictionary.

3. Now I want to train 10 Hebrew words. Press the Train selected words button.

4. Simon is now asking a question:

Your vocabulary does not define all words used in this text. These words are missing:
פורניר, פורע, פורק, שומני, שומע, שומקום, קומקומי, קומקומו, טומנו, טונו

Do you want to add them now?

Of course, I want to add them now. I just trained these then Hebrew words with Simon. They are now part of the active vocabulary.

Read the rest of this entry »

Ralf’s Italian speech model

May 16th, 2012 by producer

Some words about the creation of Ralf’s Italian speech model.

1. I took a look at the Italian frequency list. It is licensed under the LGPL – very good.

Read the rest of this entry »

Ralf’s Hungarian speech model

May 15th, 2012 by producer

Some words about the creation of Ralf’s Hungarian speech model.

1. I downloaded Ralf’s Hungarian dictionary.

2. Press the Manage scenarios button.

Read the rest of this entry »