Posts Tagged ‘aIntsl=handl=s’

Experimenting with sam

Monday, August 24th, 2009

I am experimenting with sam. Here is the content of the file german.sam:

/home/liberty/200908/sam/german/hmmdefs
/home/liberty/200908/sam/german/tiedlist
/home/liberty/200908/sam/german/model.dict
/home/liberty/200908/sam/german/model.dfa
/home/liberty/200908/sam/german/training.data/
/home/liberty/200908/sam/german/training.data/
/home/liberty/200908/sam/german/lexicon
/home/liberty/200908/sam/german/model.grammar
/home/liberty/200908/sam/german/model.voca
/home/liberty/200908/sam/german/prompts
/home/liberty/200908/sam/german/prompts
/home/liberty/200908/sam/german/tree1.hed
/home/liberty/200908/sam/german/wav_config
16000
/home/liberty/200908/sam/german/julius.jconf

Obviously, I have to do some adjustments to the lexicon, e.g. the error message Phoneme undefined: an occured. The solution is to delete the line

ANFÄNGE [Anfänge] an fEN@

in the file:///home/liberty/200908/sam/german/lexicon. After I have deleted this line, I save the file. Then I click Build model again. I have to wait a few moments.

And now, the message Phoneme undefined: pf appears. I will record the word Kopfes with Audacity (22050 hertz). Then I run the command liberty@liberty-desktop:~/200908/sam/german$ sox kopfess.wav -r 16000 -c 1 -s kopfes.wav. Then I move the file kopfes.wav to the folder /home/liberty/200908/sam/german/training.data. Now it is necessary to add the following line to the prompts file:

kopfes KOPFES

After saving the prompts file, I will press the Build model button again. Now the error message Phoneme undefined: dUNkl= appears. I have to delete the following lines that are marked in bold:

DUNKEL [dunkel] dUNkl=
DUNKELSTE [dunkelste] d U N k @ l s t @
DUNKELSTE [dunkelste] dUNkl=st@
DUNKELSTEM [dunkelstem] d U N k @ l s t @ m
DUNKELSTEN [dunkelsten] d U N k @ l s t n=
DUNKELSTEN [dunkelsten] dUNkl=stn=

And this is the next error message: Phoneme undefined: UnmItl=ba:rstn=. I will delete the following lines:

UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=

It seems that this is a good way to find out what went wrong during the import of the PLS dictionary. There are some inconsistencies that have to be fixed.

Next error message: Phoneme undefined: tUnl=. I have to delete the lines that are emphasized:

TUNNEL [Tunnel] tUnl=
TUNNELN [Tunneln] t U n @ l n
TUNNELN [Tunneln] tUnl=n=
TUNNELS [Tunnels] t U n @ l s
TUNNELS [Tunnels] tUnl=s

Error message: Phoneme undefined: SA:s@n
Deleting the lines:

CHANCE [Chance] SA:s@
CHANCEN [Chancen] SA:s@n

Maybe there was a french vowel in the PLS dictionary? I will take a look into it. Yes:

Chance ʃɑ̃ːsə
Chancen ʃɑ̃ːsən

Ugly, but I think that in the long term we might need the french vowels. Or we use similar german vowels? We could use e.g. ʃɔsən. Not very good, but it could be sufficient.

Error message: Phoneme undefined: mIta:%baIt@
I have to delete the following line:

MITARBEITER [Mitarbeiter] mIta:%baIt@ r

The corresponding entry in the PLS dictionary:

Mitarbeiter mɪtʔaːˌbaɪ̯tɐ

Normally, the PLS dictionary doesn’t contain any stress information. Why not? Because it is easier. But obviously, this entry contains a stress information.

Error message: Phoneme undefined: fIrtl=fi:nal@s

Removing the bold marked lines:

VIERTELFINALE [Viertelfinale] fIrtl=fi:nal@
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l @ n
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l n=
VIERTELFINALEN [Viertelfinalen] fIrtl=fi:naln=
VIERTELFINALES [Viertelfinales] f I r t @ l f i: n a l @ s
VIERTELFINALES [Viertelfinales] fIrtl=fi:nal@s

Error message: Phoneme undefined: arti:kl=

I have to remove the following lines:

ARTIKEL [Artikel] arti:kl=
ARTIKELN [Artikeln] arti:kl=n

Error message: Phoneme undefined: aIntsl=handl=s

I have to remove the bold marked lines:

EINZELHANDEL [Einzelhandel] aInts@lhandl=
EINZELHANDEL [Einzelhandel] aIntsl=handl=
EINZELHANDELS [Einzelhandels] aI n ts @ l h a n d @ l s
EINZELHANDELS [Einzelhandels] aIntsl=handl=s

Error message: Phoneme undefined: fo:gl=s
Removing the lines:

VOGEL [Vogel] fo:gl=
VOGELS [Vogels] fo:gl=s

Message: Phoneme undefined: [Los

I think the problem is the word Los Angeles:

LOS [Los] l o: s
LOS ANGELES [Los Angeles] l O s & n d Z @ l I s
LOSE [Lose] l o: z @

I am removing the entry “Los Angeles”.

Well, lots of problems so far. Let’s see what the next error message will be: Phoneme undefined: [New

I think I know which word is wrong. Is it New York? Of course it is:

NEW YORK [New York] n j u j O R k
NEW YORKS [New Yorks] n j u j O R k s

I will remove these two lines.

It seems that expressions that consist of two single words are causing problems.

Phoneme undefined: NAMEN
Remove the line:
IM NAMEN [im Namen] I m n a: m @ n

I think that there are more entries that will cause errors.

Phoneme undefined: ta:fl=

Removing the bold marked lines:

TAFEL [Tafel] ta:fl=
TAFELN [Tafeln] t a: f @ l n
TAFELN [Tafeln] ta:fl=n

Phoneme undefined: dE:@
Remove:
DERWEIL [derweil] dE:@ rwaIl

Phoneme undefined: bi:bl=n
Remove the bold marked lines:

BIBEL [Bibel] bi:bl=
BIBELN [Bibeln] b i: b @ l n
BIBELN [Bibeln] bi:bl=n

Phoneme undefined: bRA:S@
Remove the lines:
BRANCHE [Branche] bRA:S@
BRANCHEN [Branchen] bRA:Sn=

It is the same problem like above (Chance).

I will try it one more time to build the model.

Phoneme undefined: foeh:gl=n
I will remove the following lines:
VÖGEL [Vögel] foeh:gl=
VÖGELN [Vögeln] foeh:gl=n

OK, that is enough for now. Let’s stop here.