The active vocabulary contains 40 words. I just dictated the 40 words into gedit. 37 words were recognized correctly, 3 words were recognized wrongly:
abnahmen
beweist
Christoph
zeichnet
erzielt
extrem
falsch
DistanzFrist
Frist
gebildet
gebildetem
gemeinsam
gereist
geringem
Handlung
Hauptbahnhof
ist
Jahrhundert
Kalifornien
Landwirtschaft
Leistung
Leistung
Management
natürlich
niedrigem
offiziell
optimistisch
organisiert
positives
Produkt
Professor
Schauspieler
Sicherheit
Silvester
Sonntag
Transport
unterhielt
zeichnet
Zuschauer
Zweifel
1. The word fɛstgəʃtɛlt had been trained three times:
2. Should I train it again?
3. Or should I remove it?
4. Let’s take a look at the pronunciation of fɛstgəʃtɛlt – f E s t g @ S t E l t. What is exactly the cause for the wrong recognition? Why did simon recognize the two words DistanzFrist instead of the word festgestellt?
Distanz = dɪstants = d I s t a n ts
Frist = fʀɪst = f R I s t
A problem could be that the st in fɛstgəʃtɛlt and in dɪstants is spoken differently, but the lexicon doesn’t differentiate.
fɛstgəʃtɛlt = three syllables
dɪstantsfʀɪst = three syllables
The similar length could be a reason for the wrong recognition. You can speak a word (clearly but) slowly. And you can speak (clarly and) fast. The speed of your spoken language can influence the recognition result.
5. Should I remove the word festgestellt from the active vocabulary? I am not sure at the moment.
6. Let’s take a look at the sound waves of fɛstgəʃtɛlt and dɪstantsfʀɪst (spoken as if the two words were one single word):
Do both sound waves look similar? I don’t think so. But I think about whether I should take a closer look at the sound waves of the st in fɛstgəʃtɛlt and in dɪstants. Maybe they are quite different? And maybe we should adjust the pronounciation dictionary? There are different possible causes for the wrong recognition:
a. Dictionary should be better designed (different signs for the different st).
b. One or several of my training samples were poorly spoken (or recorded with clipping / x-run).
c. I was dictating too slow, or too fast (different dictation speed during recognition compared to the dictation speed during training).
d. simon needs more training samples.
Tags: dɪstants, fɛstgəʃtɛlt, fʀɪst, vocabulary


Hi!
You should not assume that words that are not recognized should be trained more often than the others. The acoustic model is a delicate thing and one sample affects the recognition rate of everything. In my experience it helps if you have about the same amount of training data for each word.
For 40 words I would recommend at least a recognition rate of 7 or 8 per word for a reliable recognition.
Greetings,
Peter
Hi, thanks for the hint. I will keep that in mind when retraining the words.