This is what I did today: I imported the German PLS dictionary into simon, and created an additional PLS dictionary. Of course, I imported this additional dictionary into simon, too.
I copied /home/liberty/.kde/share/apps/simon/model/lexicon to
/home/liberty/200908/sam/michverstanden/lexicon. Then, I copied /home/liberty/.kde/share/apps/simon/model/model.voca to /home/liberty/200908/sam/michverstanden/model.voca. After that, I configured sam with the paramaters that are stored in the file /home/liberty/200908/sam/michverstanden/michverstanden.sam
I want to build a speech model using the German 01 prompts. I have these prompts in 16kHz / 16 bit from Voxforge: ralfherzog-20070816_de1.tgz. I made some modifications to the PROMPTS file (Ä instead of ä; Ö instead of ö; Ü instead of ü, SS instead of ß).
I tried to build the model with sam. But an error message occured:

I don’t know how to solve this problem. Well, I have made some experiences with the phoneme & in the past:
1. Ampersand (g & N @) could be compiled
2. model.voca: changing verb to noun
Obviously, the phoneme & has to be defined. But how could that be achieved? From my point of view, we could omit this phoneme, and replace it with the phoneme E. This means that I could try to solve the problem by exchanging the phoneme & with the phoneme E in the following files with gedit:
file:///home/liberty/200908/sam/michverstanden/lexicon
file:///home/liberty/200908/sam/michverstanden/model.voca
Maybe I will try that later.
Edit: I just replaced the phoneme & with E in the files lexicon and model.voca (same path as before). The I tried to build the model with sam. Now sam displays the following message:
Phoneme undefined: Z
Well, I think that I have to train these phonemes. So it would have been sufficient to train the phoneme &. Probably, the German 01 prompts don’t contain the phonemes Z and &. So I should include prompts that contain these phonemes. Example for the phoneme Z:
IMAGE [Image] I m I Z
I think that this entry should be fixed (to I m I d Z). But not now.
I think that I will insert two single words that contain the phonemes Z and &. And I don’t have to forget to add these entries to the prompts file.
Edit September 4, 2009: I recorded the wav file job-gaenge.wav with Audacity. Then I applied the following command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox job-gaenge.wav -r 16000 -c 1 -s job_gaenge.wav
Now I have the file job_gaenge.wav in my training folder. It is now necessary to modify the prompts file:
file:///home/liberty/200908/sam/michverstanden/prompts
The next step would be to build the speech model with sam. I will do that now. I just started sam. I have to open the file /home/liberty/200908/sam/michverstanden/michverstanden.sam. When trying to build the model, the following error message occured:
Phoneme undefined: y
OK, I will have to define this phoneme, too. Now I will apply the following command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox ungluecks_.wav -r 16000 -c 1 -s ungluecks.wav
What is the problem now? The following error message appeared:
Error while coding the samples!
Please check the path to HCopy (/usr/local/bin/HCopy) and the wav config (/home/liberty/200908/sam/michverstanden/wav_config)
OK, I understand: I made a small mistake. I had added to the prompts file the following line:
ungluecks.wav UNGLÜCKS
This was wrong. The following line is the correct one:
ungluecks UNGLÜCKS
A small mistake, and it doesn’t work. And again the same error message:
Phoneme undefined: y
I understand my mistake. Take a look into the lexicon:
UNGLÜCKS [Unglücks] U n g l Y k s
The Y and the y are different phonemes. I will train the following entry:
ÄGYPTEN [Ägypten] E g y p t n=
I don’t know why we are distinguishing between the Y and the y. The reason can be found in the Wiktionary:
[y] U+0079 nur in Fremdwörtern: Physik /[fyˈsɪk]/
[ʏ] U+028F dünn /[dʏn]/, lüften /[ˈlʏftn̩]/, Symbol /[zʏmˈboːl]/
When I submit words for the dictionary acquisition project, I try to follow this rule. I don’t understand the sense of this rule, but it is a rule. We will have to discuss this issue. I applied the following command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox aegypten-aegypten.wav -r 16000 -c 1 -s aegypten_aegypten.wav
Another problem occurs:
Phoneme undefined: E:
I will add the following word:
ANSCHLÄGE [Anschläge] a n S l E: g @
I am executing the command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox anschlaege-anschlaege.wav -r 16000 -c 1 -s anschlaege_anschlaege.wav
OK, another phoneme is missing:
Phoneme undefined: OY
I will take the following entry:
MEHRWERTSTEUER [Mehrwertsteuer] m e: @ r v e: @ r t S t OY @ r
I am executing the command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox mehrwertsteuer-mehrwertsteuer.wav -r 16000 -c 1 -s mehrwertsteuer_mehrwertsteuer.wav
OK, another problem:
Phoneme undefined: an
There is obviously an error in the lexicon:
ANFÄNGE [Anfänge] an fEN@
This is the corresponding entry in the PLS dictionary that had been imported:
<lexeme>
<grapheme>Anfänge</grapheme>
<phoneme>an.fɛŋə</phoneme>
</lexeme>
I will delete this entry from the following lexicon:
file:///home/liberty/200908/sam/michverstanden/lexicon
I might have to do that again when I replace this lexicon with a new one. So this is a good reminder for me.
OK, next problem: Phoneme undefined: dUNkl=
I think I know what the problem is. Next problem: Phoneme undefined: UnmItl=ba:rstn=
I have to delete the following lines:
UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=
Next problem: (more…)