I want to import a small sample dictionary into simon (Sphinx format):
The source can be found here (I don’t know how long this link will be valid). The dictionary contains 19 Polish words (US-ASCII). Here is what you have to do next:
1. Select Applications > Universal Access > simon.
2. Press the Wordlist button.
3. Press Import Dictionary.
4. You can select the target: shadow dictionary or active dictionary. For this Polish example dictionary, choose active dictionary.
5. Press the Next button.
And now it is time to choose the appropriate lexicon format:
Import the dictionary (with the 19 Polish words; see the screen-shot at the beginning of this post) as SPHINX lexicon.
You have to select the path to the Polish Sphinx dictionary. After pressing the Next button, the following message appears:
The Polish Sphinx dictionary has been imported successfully. Press the Finish button.
Now let’s train a Polish word:
a. Select the Polish word JEDEN.
b. Add to Training.
c. Train selected Words.
You can now record the Polish word with simon:
Select Applications > Universal Access > ksimond. It is necessary to configure ksimond. The details are explained in the ksimond handbook.
It is necessary that simon and ksimond are running. Now let’s take a look at the following screen-shot:
d. I have recorded the Polish word JEDEN one time. This is indicated by the number 1 in the right column. How often should a word be recorded? Maybe 8 times would be a good number. I don’t know for sure, but I have the feeling that you should record each word about 8 times with simon. Why 8 times? Because when training 158 German words with simon, I recorded each word about 8 times (some words were recorded just 6 times; while others were recorded 10 times). The result was that 148 out of 158 German words were recognized correctly. So from my experience I can say: record each word in the sample Polish Sphinx dictionary 8 times.
e. Press the Connect button. What does that mean? This means that simon (client) connects to ksimond (server) via TCP/IP (maybe this is not 100 % correct, but the concept should be clear).
I pressed the Connect button. Now simon and ksimond are connected. Let’s press the Synchronize button.
f. Press Synchronize.
g. Press Activate.
h. An error message appears:
Couldn’t start recognition because the system reports that the recognition is not ready.
Please check if you have defined a wordlist, an appropriate grammar and recorded a few trainings samples.
The system will then, upon synchronization, generate the model which will be used for the recognition.
i. What does Connected but Deactivated mean? It means that a TCP/IP connection between simon and ksimond exists. But simon doesn’t recognize any words.
j. Why doesn’t simon recognize any words? I didn’t define an appropriate grammar. Without a grammar, recognition is not possible.
k. Now it would be necessary to press the Grammar button. But I want to finish now with this article. Read in the simon handbook (PDF) how you can define a grammar.
You learned in this article how to
- import a Polish Sphinx dictionary;
- record the Polish word JEDEN;
- connect to ksimond.










