Only a subset is recognized

This is the current situation: I increased the size of the active vocabulary from 18000 to 300000. When I export my current scenario, I get an XML file of 58 MB with the name ralfdic-scenario-20100725.xml (please compare this value with the size of my phonetic dictionary: Ralf's German dictionary 0.1.9.9 has a size of 45 MB). This means that almost all words that are included in Ralf's German dictionary are included in my current simon scenario. Only the words that are affected are excluded from the active vocabulary (just about 1000 words are affected. This means that almost all words that you can find in Ralf's German dictionary can be found in ralfdic-scenario-20100725.xml (license: GPLv3).

But what happens when I dictate? simon obviously recognizes only a subset of 18000 words (only words that are included in Ralf's German IPA FLAC files). Why is that?

Take a look into file:///home/ubuntu/.kde/share/apps/simon/model/prompts (license of this file is GPLv3). My current prompts file contains only 18000 words, and not 300000 words. But my German scenario file ralfdic-scenario-20100725.xml contains 300000 words. And when I dictate, simon only recognizes words that are included in the prompts file. simon doesn’t recognize the other words that are part of the active vocabulary.

My current prompts file is a subset of my current active vocabulary.
My current active vocabulary is a superset of my current prompts file.

How can I solve this issue? I want that simon recognizes words that haven’t been trained before. Even words that are marked in red color in the active vocabulary should be recognized. I don’t want to record all 300000 words. This issue should be solvable, but how?

Tags:

5 Responses to “Only a subset is recognized”

  1. Peter Grasch says:

    Simple: Create your model and set the created model files as static base model.

    The remove-words-that-are-not-trained feature is a safety feature and only applies for user generated models.

  2. producer says:

    OK. I will try that. Thanks for the info, Peter.

  3. producer says:

    Sorry, I tried it with the static model option. But simon still recognizes only words that have been explicitly trained. The other words in the vocabulary are not recognized.

  4. producer says:

    This is what I am trying now: I
    - delete the active vocabulary.
    - delete the current simon scenario.
    - import ralfdic-scenario-20100725.xml.

    simon still does only recognize the words that are part of the prompts file even though I am using the static model option.

  5. Peter Grasch says:

    Well when using a static model the vocabulary shouldn’t be checked.

    You can have a look if the untrained words are in the vocabulary file: ~/.kde/share/apps/simond/models//active/model.voca

    SAM doesn’t do any of these safety checks at all btw. if you serialize prompts and scenarios separately.

    Regards,
    Peter