A few minutes ago, I deleted several files from the /home/ubuntu/.kde/share/config/ folder. Then I repeated steps 1 – 4. Then I pressed the following buttons: Connect – Synchronize – Activate. And then, this error message appeared:
Obviously, a lot of words beginning with the letter b are affected. I think that maybe something went wrong with section xdb.
I found the mistake: The file:///home/ubuntu/Documents/201006/audacity/xdb-folder/prompts-xdb contains wrong content. I will have to fix that.
It is working now.
Keep in mind that this might be intended behavior.
If you have multiple pronunciations per word, the model compilation will likely pick one over the other essentially leaving the other untrained. Because the model optimization happens on lexeme level (not phoneme), both pronunciation will end up in your active vocabulary.
When Julius tries to start it needs to find HMMs for each triphone in your active vocabulary which might fail due to this problem (different pronunciation chosen during the alignment).
Of course this problem subsides in large models as you will cover every triphone eventually…
Regards,
Peter
Hello Peter!
As work-around, I am using the XPath expression
select="phoneme[1]"incompare.xsl.This is the reason why I built
Ralf's German IPA FLAC filesbecause I want to work on<phoneme>level (and not on<grapheme>level). The formatIPA.flacshould be a step into this direction.I have recorded more than 20000 German
IPA.flacfiles (and uploaded them to Voxforge). Most triphones should be catched by these 20000 audio files.Regards,
Ralf
So you are naming your files to the first pronunciation found? How is this a step forward? All you do is lose the advantage of multiple alternative pronunciations and forcing the compilation into what might be the wrong pronuncation.
The compilation itself does the alignment of course on phoneme level and it _should_ have the choice for itself. The model optimization of simon (removing untrained words before the recognition starts) works on lexeme level – something which is quite alright because the not covered phonemes wouldn’t be recognized anyway.
Regards,
Peter