Try to train the Polish word “JEDEN”

I want to import a small sample dictionary into simon (Sphinx format):

polish-sphinx

The source can be found here (I don’t know how long this link will be valid). The dictionary contains 19 Polish words (US-ASCII). Here is what you have to do next:

universal

1. Select Applications > Universal Access > simon.

import-dictionary

2. Press the Wordlist button.
3. Press Import Dictionary.

shadow-dictionary

4. You can select the target: shadow dictionary or active dictionary. For this Polish example dictionary, choose active dictionary.
5. Press the Next button.

And now it is time to choose the appropriate lexicon format:

import-sphinx

Import the dictionary (with the 19 Polish words; see the screen-shot at the beginning of this post) as SPHINX lexicon.

sphinx-automatic

You have to select the path to the Polish Sphinx dictionary. After pressing the Next button, the following message appears:

finish

The Polish Sphinx dictionary has been imported successfully. Press the Finish button.

Now let’s train a Polish word:

add-polish

a. Select the Polish word JEDEN.
b. Add to Training.
c. Train selected Words.

You can now record the Polish word with simon:

train-polish

Select Applications > Universal Access > ksimond. It is necessary to configure ksimond. The details are explained in the ksimond handbook.

It is necessary that simon and ksimond are running. Now let’s take a look at the following screen-shot:

connect-polish

d. I have recorded the Polish word JEDEN one time. This is indicated by the number 1 in the right column. How often should a word be recorded? Maybe 8 times would be a good number. I don’t know for sure, but I have the feeling that you should record each word about 8 times with simon. Why 8 times? Because when training 158 German words with simon, I recorded each word about 8 times (some words were recorded just 6 times; while others were recorded 10 times). The result was that 148 out of 158 German words were recognized correctly. So from my experience I can say: record each word in the sample Polish Sphinx dictionary 8 times.

e. Press the Connect button. What does that mean? This means that simon (client) connects to ksimond (server) via TCP/IP (maybe this is not 100 % correct, but the concept should be clear).

I pressed the Connect button. Now simon and ksimond are connected. Let’s press the Synchronize button.

connected-deactivated

f. Press Synchronize.
g. Press Activate.
h. An error message appears:

Couldn’t start recognition because the system reports that the recognition is not ready.

Please check if you have defined a wordlist, an appropriate grammar and recorded a few trainings samples.

The system will then, upon synchronization, generate the model which will be used for the recognition.

i. What does Connected but Deactivated mean? It means that a TCP/IP connection between simon and ksimond exists. But simon doesn’t recognize any words.

j. Why doesn’t simon recognize any words? I didn’t define an appropriate grammar. Without a grammar, recognition is not possible.

k. Now it would be necessary to press the Grammar button. But I want to finish now with this article. Read in the simon handbook (PDF) how you can define a grammar.

You learned in this article how to
- import a Polish Sphinx dictionary;
- record the Polish word JEDEN;
- connect to ksimond.

Tags: , ,

Comments are closed.