This is what I did today: I imported the German PLS dictionary into simon, and created an additional PLS dictionary. Of course, I imported this additional dictionary into simon, too.
I copied /home/liberty/.kde/share/apps/simon/model/lexicon to
/home/liberty/200908/sam/michverstanden/lexicon. Then, I copied /home/liberty/.kde/share/apps/simon/model/model.voca to /home/liberty/200908/sam/michverstanden/model.voca. After that, I configured sam with the paramaters that are stored in the file /home/liberty/200908/sam/michverstanden/michverstanden.sam
I want to build a speech model using the German 01 prompts. I have these prompts in 16kHz / 16 bit from Voxforge: ralfherzog-20070816_de1.tgz. I made some modifications to the PROMPTS file (Ä instead of ä; Ö instead of ö; Ü instead of ü, SS instead of ß).
I tried to build the model with sam. But an error message occured:
I don’t know how to solve this problem. Well, I have made some experiences with the phoneme & in the past:
1. Ampersand (g & N @) could be compiled
2. model.voca: changing verb to noun
Obviously, the phoneme & has to be defined. But how could that be achieved? From my point of view, we could omit this phoneme, and replace it with the phoneme E. This means that I could try to solve the problem by exchanging the phoneme & with the phoneme E in the following files with gedit:
file:///home/liberty/200908/sam/michverstanden/lexicon
file:///home/liberty/200908/sam/michverstanden/model.voca
Maybe I will try that later.
Edit: I just replaced the phoneme & with E in the files lexicon and model.voca (same path as before). The I tried to build the model with sam. Now sam displays the following message:
Phoneme undefined: Z
Well, I think that I have to train these phonemes. So it would have been sufficient to train the phoneme &. Probably, the German 01 prompts don’t contain the phonemes Z and &. So I should include prompts that contain these phonemes. Example for the phoneme Z:
IMAGE [Image] I m I Z
I think that this entry should be fixed (to I m I d Z). But not now.
I think that I will insert two single words that contain the phonemes Z and &. And I don’t have to forget to add these entries to the prompts file.
Edit September 4, 2009: I recorded the wav file job-gaenge.wav with Audacity. Then I applied the following command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox job-gaenge.wav -r 16000 -c 1 -s job_gaenge.wav
Now I have the file job_gaenge.wav in my training folder. It is now necessary to modify the prompts file:
file:///home/liberty/200908/sam/michverstanden/prompts
The next step would be to build the speech model with sam. I will do that now. I just started sam. I have to open the file /home/liberty/200908/sam/michverstanden/michverstanden.sam. When trying to build the model, the following error message occured:
Phoneme undefined: y
OK, I will have to define this phoneme, too. Now I will apply the following command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox ungluecks_.wav -r 16000 -c 1 -s ungluecks.wav
What is the problem now? The following error message appeared:
Error while coding the samples!
Please check the path to HCopy (/usr/local/bin/HCopy) and the wav config (/home/liberty/200908/sam/michverstanden/wav_config)
OK, I understand: I made a small mistake. I had added to the prompts file the following line:
ungluecks.wav UNGLÜCKS
This was wrong. The following line is the correct one:
ungluecks UNGLÜCKS
A small mistake, and it doesn’t work. And again the same error message:
Phoneme undefined: y
I understand my mistake. Take a look into the lexicon:
UNGLÜCKS [Unglücks] U n g l Y k s
The Y and the y are different phonemes. I will train the following entry:
ÄGYPTEN [Ägypten] E g y p t n=
I don’t know why we are distinguishing between the Y and the y. The reason can be found in the Wiktionary:
[y] U+0079 nur in Fremdwörtern: Physik /[fyˈsɪk]/
[ʏ] U+028F dünn /[dʏn]/, lüften /[ˈlʏftn̩]/, Symbol /[zʏmˈboːl]/
When I submit words for the dictionary acquisition project, I try to follow this rule. I don’t understand the sense of this rule, but it is a rule. We will have to discuss this issue. I applied the following command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox aegypten-aegypten.wav -r 16000 -c 1 -s aegypten_aegypten.wav
Another problem occurs:
Phoneme undefined: E:
I will add the following word:
ANSCHLÄGE [Anschläge] a n S l E: g @
I am executing the command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox anschlaege-anschlaege.wav -r 16000 -c 1 -s anschlaege_anschlaege.wav
OK, another phoneme is missing:
Phoneme undefined: OY
I will take the following entry:
MEHRWERTSTEUER [Mehrwertsteuer] m e: @ r v e: @ r t S t OY @ r
I am executing the command:
liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox mehrwertsteuer-mehrwertsteuer.wav -r 16000 -c 1 -s mehrwertsteuer_mehrwertsteuer.wav
OK, another problem:
Phoneme undefined: an
There is obviously an error in the lexicon:
ANFÄNGE [Anfänge] an fEN@
This is the corresponding entry in the PLS dictionary that had been imported:
<lexeme> <grapheme>Anfänge</grapheme><phoneme>an.fɛŋə</phoneme> </lexeme>
I will delete this entry from the following lexicon:
file:///home/liberty/200908/sam/michverstanden/lexicon
I might have to do that again when I replace this lexicon with a new one. So this is a good reminder for me.
OK, next problem: Phoneme undefined: dUNkl=
I think I know what the problem is. Next problem: Phoneme undefined: UnmItl=ba:rstn=
I have to delete the following lines:
UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=
Next problem: Phoneme undefined: tUnl=. I will now delete several entries with the phoneme l= because that seems to cause errors:
VIERTELFINALE [Viertelfinale] fIrtl=fi:nal@
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l @ n
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l n=
VIERTELFINALEN [Viertelfinalen] fIrtl=fi:naln=
VIERTELFINALES [Viertelfinales] f I r t @ l f i: n a l @ s
VIERTELFINALES [Viertelfinales] fIrtl=fi:nal@s
Deleting:
VOGEL [Vogel] fo:gl=
VOGELS [Vogels] fo:gl=s
Deleting:
VÖGEL [Vögel] foeh:gl=
VÖGELN [Vögeln] foeh:gl=n
Deleting:
WAFFEL [Waffel] vafl=
WAFFELN [Waffeln] vafl=n
Deleting:
TUNNEL [Tunnel] tUnl=
TUNNELN [Tunneln] t U n @ l n
TUNNELN [Tunneln] tUnl=n=
TUNNELS [Tunnels] t U n @ l s
TUNNELS [Tunnels] tUnl=s
Deleting:
TITEL [Titel] ti:tl=
TITELN [Titeln] ti:tl=n
TITELS [Titels] ti:tl=s
Deleting:
TEUFEL [Teufel] tOIfl=
TEUFELN [Teufeln] tOIfl=n
TEUFELS [Teufels] tOIfl=s
Deleting:
TEMPEL [Tempel] tEmpl=
TEMPELN [Tempeln] t E m p @ l n
TEMPELN [Tempeln] tEmpl=n
TEMPELS [Tempels] t E m p @ l s
TEMPELS [Tempels] tEmpl=s
Deleting:
TAFEL [Tafel] ta:fl=
TAFELN [Tafeln] t a: f @ l n
TAFELN [Tafeln] ta:fl=n
Deleting:
RÄTSEL [Rätsel] rE:tsl=
RÄTSELN [Rätseln] r E: ts @ l n
RÄTSELN [Rätseln] rE:tsl=n
RÄTSELS [Rätsels] r E: ts @ l s
RÄTSELS [Rätsels] rE:tsl=s
Deleting:
REGEL [Regel] Re:gl=
REGELN [Regeln] R e: g @ l n
REGELN [Regeln] Re:gl=n
REGELS [Regels] Re:gl=s
Deleting:
ONKEL [Onkel] ONkl=
ONKELN [Onkeln] O N k @ l n
ONKELN [Onkeln] ONkl=n
ONKELS [Onkels] O N k @ l s
ONKELS [Onkels] ONkl=s
Deleting:
NEBEL [Nebel] ne:bl=
NEBELN [Nebeln] n e: b @ l n=
NEBELN [Nebeln] ne:bl=n
NEBELS [Nebels] n e: b @ l s
NEBELS [Nebels] ne:bl=s
Deleting:
MÖBEL [Möbel] moeh:bl=
MÖBELS [Möbels] moeh:bl=s
Deleting:
MÄNTEL [Mäntel] mEntl=
Deleting:
MITTEL [Mittel] mItl=
MITTELFELD [Mittelfeld] mItl=fElt
MITTELFELDER [Mittelfelder] mItl=fEld@ r
MITTELFELDES [Mittelfeldes] mItl=fEld@s
MITTELFELDS [Mittelfelds] mItl=fElts
MITTELN [Mitteln] mItl=n
MITTELS [mittels] mItl=s
MITTELS [Mittels] mItl=s
MITTELSTAND [Mittelstand] m I t @ l S t a n t
MITTELSTAND [Mittelstand] mItl=Stant
MITTELSTANDES [Mittelstandes] m I t @ l S t a n d @ s
MITTELSTANDES [Mittelstandes] mItl=Stand@s
MITTELSTANDS [Mittelstands] m I t @ l S t a n ts
MITTELSTANDS [Mittelstands] mItl=Stants
Deleting:
MANTEL [Mantel] mantl=
Deleting:
LEBENSMITTEL [Lebensmittel] le:b@nsmItl=
LEBENSMITTELN [Lebensmitteln] l e: b @ n s m I t @ l n
LEBENSMITTELN [Lebensmitteln] le:b@nsmItl=n
LEBENSMITTELS [Lebensmittels] l e: b @ n s m I t @ l s
LEBENSMITTELS [Lebensmittels] le:b@nsmItl=s
Deleting:
KÄBEL [Käbel] ka:bl=
Deleting:
KUGEL [Kugel] ku:gl=
KUGELN [Kugeln] k u: g @ l n
KUGELN [Kugeln] ku:gl=n
Deleting:
KABELN [Kabeln] ka:bl=n
KABELS [Kabels] k a: b @ l s
KABELS [Kabels] ka:bl=s
Deleting:
JUBEL [Jubel] ju:bl=
JUBELS [Jubels] ju:bl=s
Deleting:
INSEL [Insel] Inzl=
INSELN [Inseln] Inzl=n
Deleting:
HÜGEL [Hügel] hy:gl=
HÜGELN [Hügeln] hy:gl=n
HÜGELS [Hügels] hy:gl=s
Deleting:
HENKEL [Henkel] hENkl=
HENKELN [Henkeln] hENkl=n
HENKELS [Henkels] hENkl=s
Deleting:
GEWECHSELT [gewechselt] g@vEksl=t
Deleting:
FLÜGEL [Flügel] fly:gl=
FLÜGELN [Flügeln] fly:gl=n
FLÜGELS [Flügels] fly:gl=s
Deleting:
ENKEL [Enkel] ENkl=
ENKELN [Enkeln] E N k @ l n
ENKELN [Enkeln] ENkl=n
ENKELS [Enkels] E N k @ l s
ENKELS [Enkels] ENkl=s
Deleting:
ENGEL [Engel] ENl=
ENGELN [Engeln] ENl=n
ENGELS [Engels] ENl=s
Deleting:
EINZELHANDEL [Einzelhandel] aInts@lhandl=
EINZELHANDEL [Einzelhandel] aIntsl=handl=
EINZELHANDELS [Einzelhandels] aI n ts @ l h a n d @ l s
EINZELHANDELS [Einzelhandels] aIntsl=handl=s
Deleting:
BIBEL [Bibel] bi:bl=
BIBELN [Bibeln] b i: b @ l n
BIBELN [Bibeln] bi:bl=n
Deleting:
ARTIKEL [Artikel] arti:kl=
ARTIKELN [Artikeln] arti:kl=n
Deleting:
WINKEL [Winkel] vINkl=
WINKELN [Winkeln] v I N k @ l n
WINKELN [Winkeln] vINkl=n
WINKELS [Winkels] v I N k @ l s
WINKELS [Winkels] vINkl=s
Deleting:
WECHSEL [Wechsel] vEksl=
WECHSELE [wechsele] vEksl=@
WECHSELN [Wechseln] vEksl=n
WECHSELN [wechseln] vEksl=n
WECHSELND [wechselnd] vEksl=nt
WECHSELS [Wechsels] vEksl=s
WECHSELST [wechselst] vEksl=st
WECHSELT [wechselt] vEksl=t
WECHSELTE [wechselte] vEksl=t@
WECHSELTEN [wechselten] vEksl=tn=
WECHSLE [wechsle] vEksl=@
Deleting:
WANDEL [Wandel] vandl=
WANDELS [Wandels] vandl=s
Deleting:
SCHLÜSSEL [Schlüssel] SlYsl=
SCHLÜSSELN [Schlüsseln] S l Y s @ l n
SCHLÜSSELN [Schlüsseln] SlYsl=n
SCHLÜSSELS [Schlüssels] S l Y s @ l s
SCHLÜSSELS [Schlüssels] SlYsl=s
I will try again to build the model with sam. But the next problem occurs: Phoneme undefined: SA:s@n. I know this problem.
Deleting:
CHANCE [Chance] SA:s@
CHANCEN [Chancen] SA:s@n
Deleting:
BRANCHE [Branche] bRA:S@
BRANCHEN [Branchen] bRA:Sn=
Deleting:
ENGAGEMENT [Engagement] AgaZ@mA:
ENGAGEMENTS [Engagements] AgaZ@mA:s
And now I try it again to build the model with sam. Another already known problem appears: Phoneme undefined: [Los. Deleting:
LOS ANGELES [Los Angeles] l O s & n d Z @ l I s
Deleting:
NEW YORK [New York] n j u j O R k
NEW YORKS [New Yorks] n j u j O R k s
Trying again to build the model with sam. Error: Phoneme undefined: dE:@. Deleting:
DERWEIL [derweil] dE:@ rwaIl
Deleting:
ZU VIEL [zu viel] ts u: f i: l
ZU VIELE [zu viele] ts u: f i: l @
ZU VIELEN [zu vielen] ts u: f i: l n=
ZU VIELER [zu vieler] ts u: f i: l @ r
Error message: Phoneme undefined: oeh. I guess I have to record such a word. I will choose the following one:
AUFGEHÖRT [aufgehört] aU f g @ h oeh @ r t
An alternative would be to replace the phoneme oeh with the phoneme oe. Applying the command: liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox aufgehoert-aufgehoert.wav -r 16000 -c 1 -s aufgehoert_aufgehoert.wav
Problem: Phoneme undefined: UNION. Deleting:
EUROPÄISCHE UNION [Europäische Union] OI r o p E: I S @ u n j o: n
Problem: Phoneme undefined: wOSiNt@n. Deleting:
WASHINGTON [Washington] wOSiNt@n
WASHINGTONS [Washingtons] wOSiNt@ns
Phoneme undefined: slowakei. Deleting:
SLOWAKEI [Slowakei] slowakei
It was possible to build the speech model. The next step would be to test this model.
Tags: m e: @ r v e: @ r t S t OY @ r, OI r o p E: I S @ u n j o: n, revision 989, sam, wOSiNt@ns

In the article, I wrote: “An alternative would be to replace the phoneme oeh with the phoneme oe.”
Well, in the French pronunciation the phonemes
œandøare distinguished. I will think about that.[...] testing simon my first steps with the simon speech recognition software « michverstanden.sam [...]