Posts Tagged ‘m e: @ r v e: @ r t S t OY @ r’

michverstanden.sam

Thursday, September 3rd, 2009

This is what I did today: I imported the German PLS dictionary into simon, and created an additional PLS dictionary. Of course, I imported this additional dictionary into simon, too.

I copied /home/liberty/.kde/share/apps/simon/model/lexicon to
/home/liberty/200908/sam/michverstanden/lexicon. Then, I copied /home/liberty/.kde/share/apps/simon/model/model.voca to /home/liberty/200908/sam/michverstanden/model.voca. After that, I configured sam with the paramaters that are stored in the file /home/liberty/200908/sam/michverstanden/michverstanden.sam

I want to build a speech model using the German 01 prompts. I have these prompts in 16kHz / 16 bit from Voxforge: ralfherzog-20070816_de1.tgz. I made some modifications to the PROMPTS file (Ä instead of ä; Ö instead of ö; Ü instead of ü, SS instead of ß).

I tried to build the model with sam. But an error message occured:

ampersand

I don’t know how to solve this problem. Well, I have made some experiences with the phoneme & in the past:

1. Ampersand (g & N @) could be compiled
2. model.voca: changing verb to noun

Obviously, the phoneme & has to be defined. But how could that be achieved? From my point of view, we could omit this phoneme, and replace it with the phoneme E. This means that I could try to solve the problem by exchanging the phoneme & with the phoneme E in the following files with gedit:

file:///home/liberty/200908/sam/michverstanden/lexicon
file:///home/liberty/200908/sam/michverstanden/model.voca

Maybe I will try that later.

Edit: I just replaced the phoneme & with E in the files lexicon and model.voca (same path as before). The I tried to build the model with sam. Now sam displays the following message:

Phoneme undefined: Z

Well, I think that I have to train these phonemes. So it would have been sufficient to train the phoneme &. Probably, the German 01 prompts don’t contain the phonemes Z and &. So I should include prompts that contain these phonemes. Example for the phoneme Z:

IMAGE [Image] I m I Z

I think that this entry should be fixed (to I m I d Z). But not now.

I think that I will insert two single words that contain the phonemes Z and &. And I don’t have to forget to add these entries to the prompts file.

Edit September 4, 2009: I recorded the wav file job-gaenge.wav with Audacity. Then I applied the following command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox job-gaenge.wav -r 16000 -c 1 -s job_gaenge.wav

Now I have the file job_gaenge.wav in my training folder. It is now necessary to modify the prompts file:

file:///home/liberty/200908/sam/michverstanden/prompts

The next step would be to build the speech model with sam. I will do that now. I just started sam. I have to open the file /home/liberty/200908/sam/michverstanden/michverstanden.sam. When trying to build the model, the following error message occured:

Phoneme undefined: y

OK, I will have to define this phoneme, too. Now I will apply the following command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox ungluecks_.wav -r 16000 -c 1 -s ungluecks.wav

What is the problem now? The following error message appeared:

Error while coding the samples!

Please check the path to HCopy (/usr/local/bin/HCopy) and the wav config (/home/liberty/200908/sam/michverstanden/wav_config)

OK, I understand: I made a small mistake. I had added to the prompts file the following line:

ungluecks.wav UNGLÜCKS

This was wrong. The following line is the correct one:

ungluecks UNGLÜCKS

A small mistake, and it doesn’t work. And again the same error message:

Phoneme undefined: y

I understand my mistake. Take a look into the lexicon:

UNGLÜCKS [Unglücks] U n g l Y k s

The Y and the y are different phonemes. I will train the following entry:

ÄGYPTEN [Ägypten] E g y p t n=

I don’t know why we are distinguishing between the Y and the y. The reason can be found in the Wiktionary:

[y] U+0079 nur in Fremdwörtern: Physik /[fyˈsɪk]/
[ʏ] U+028F dünn /[dʏn]/, lüften /[ˈlʏftn̩]/, Symbol /[zʏmˈboːl]/

When I submit words for the dictionary acquisition project, I try to follow this rule. I don’t understand the sense of this rule, but it is a rule. We will have to discuss this issue. I applied the following command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox aegypten-aegypten.wav -r 16000 -c 1 -s aegypten_aegypten.wav

Another problem occurs:

Phoneme undefined: E:

I will add the following word:

ANSCHLÄGE [Anschläge] a n S l E: g @

I am executing the command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox anschlaege-anschlaege.wav -r 16000 -c 1 -s anschlaege_anschlaege.wav

OK, another phoneme is missing:

Phoneme undefined: OY

I will take the following entry:

MEHRWERTSTEUER [Mehrwertsteuer] m e: @ r v e: @ r t S t OY @ r

I am executing the command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox mehrwertsteuer-mehrwertsteuer.wav -r 16000 -c 1 -s mehrwertsteuer_mehrwertsteuer.wav

OK, another problem:

Phoneme undefined: an

There is obviously an error in the lexicon:

ANFÄNGE [Anfänge] an fEN@

This is the corresponding entry in the PLS dictionary that had been imported:

 <lexeme>
  <grapheme>Anfänge</grapheme>
  <phoneme>an.fɛŋə</phoneme>
 </lexeme>

I will delete this entry from the following lexicon:

file:///home/liberty/200908/sam/michverstanden/lexicon

I might have to do that again when I replace this lexicon with a new one. So this is a good reminder for me.

OK, next problem: Phoneme undefined: dUNkl=

I think I know what the problem is. Next problem: Phoneme undefined: UnmItl=ba:rstn=

I have to delete the following lines:

UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=

Next problem: (more…)