Posts Tagged ‘revision 909’

Experimenting with sam

Monday, August 24th, 2009

I am experimenting with sam. Here is the content of the file german.sam:

/home/liberty/200908/sam/german/hmmdefs
/home/liberty/200908/sam/german/tiedlist
/home/liberty/200908/sam/german/model.dict
/home/liberty/200908/sam/german/model.dfa
/home/liberty/200908/sam/german/training.data/
/home/liberty/200908/sam/german/training.data/
/home/liberty/200908/sam/german/lexicon
/home/liberty/200908/sam/german/model.grammar
/home/liberty/200908/sam/german/model.voca
/home/liberty/200908/sam/german/prompts
/home/liberty/200908/sam/german/prompts
/home/liberty/200908/sam/german/tree1.hed
/home/liberty/200908/sam/german/wav_config
16000
/home/liberty/200908/sam/german/julius.jconf

Obviously, I have to do some adjustments to the lexicon, e.g. the error message Phoneme undefined: an occured. The solution is to delete the line

ANFÄNGE [Anfänge] an fEN@

in the file:///home/liberty/200908/sam/german/lexicon. After I have deleted this line, I save the file. Then I click Build model again. I have to wait a few moments.

And now, the message Phoneme undefined: pf appears. I will record the word Kopfes with Audacity (22050 hertz). Then I run the command liberty@liberty-desktop:~/200908/sam/german$ sox kopfess.wav -r 16000 -c 1 -s kopfes.wav. Then I move the file kopfes.wav to the folder /home/liberty/200908/sam/german/training.data. Now it is necessary to add the following line to the prompts file:

kopfes KOPFES

After saving the prompts file, I will press the Build model button again. Now the error message Phoneme undefined: dUNkl= appears. I have to delete the following lines that are marked in bold:

DUNKEL [dunkel] dUNkl=
DUNKELSTE [dunkelste] d U N k @ l s t @
DUNKELSTE [dunkelste] dUNkl=st@
DUNKELSTEM [dunkelstem] d U N k @ l s t @ m
DUNKELSTEN [dunkelsten] d U N k @ l s t n=
DUNKELSTEN [dunkelsten] dUNkl=stn=

And this is the next error message: Phoneme undefined: UnmItl=ba:rstn=. I will delete the following lines:

UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=

It seems that this is a good way to find out what went wrong during the import of the PLS dictionary. There are some inconsistencies that have to be fixed.

Next error message: Phoneme undefined: tUnl=. I have to delete the lines that are emphasized:

TUNNEL [Tunnel] tUnl=
TUNNELN [Tunneln] t U n @ l n
TUNNELN [Tunneln] tUnl=n=
TUNNELS [Tunnels] t U n @ l s
TUNNELS [Tunnels] tUnl=s

Error message: Phoneme undefined: SA:s@n
Deleting the lines:

CHANCE [Chance] SA:s@
CHANCEN [Chancen] SA:s@n

Maybe there was a french vowel in the PLS dictionary? I will take a look into it. Yes:

Chance ʃɑ̃ːsə
Chancen ʃɑ̃ːsən

Ugly, but I think that in the long term we might need the french vowels. Or we use similar german vowels? We could use e.g. ʃɔsən. Not very good, but it could be sufficient.

Error message: Phoneme undefined: mIta:%baIt@
I have to delete the following line:

MITARBEITER [Mitarbeiter] mIta:%baIt@ r

The corresponding entry in the PLS dictionary:

Mitarbeiter mɪtʔaːˌbaɪ̯tɐ

Normally, the PLS dictionary doesn’t contain any stress information. Why not? Because it is easier. But obviously, this entry contains a stress information.

Error message: Phoneme undefined: fIrtl=fi:nal@s

Removing the bold marked lines:

VIERTELFINALE [Viertelfinale] fIrtl=fi:nal@
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l @ n
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l n=
VIERTELFINALEN [Viertelfinalen] fIrtl=fi:naln=
VIERTELFINALES [Viertelfinales] f I r t @ l f i: n a l @ s
VIERTELFINALES [Viertelfinales] fIrtl=fi:nal@s

Error message: Phoneme undefined: arti:kl=

I have to remove the following lines:

ARTIKEL [Artikel] arti:kl=
ARTIKELN [Artikeln] arti:kl=n

Error message: Phoneme undefined: aIntsl=handl=s

I have to remove the bold marked lines:

EINZELHANDEL [Einzelhandel] aInts@lhandl=
EINZELHANDEL [Einzelhandel] aIntsl=handl=
EINZELHANDELS [Einzelhandels] aI n ts @ l h a n d @ l s
EINZELHANDELS [Einzelhandels] aIntsl=handl=s

Error message: Phoneme undefined: fo:gl=s
Removing the lines:

VOGEL [Vogel] fo:gl=
VOGELS [Vogels] fo:gl=s

Message: Phoneme undefined: [Los

I think the problem is the word Los Angeles:

LOS [Los] l o: s
LOS ANGELES [Los Angeles] l O s & n d Z @ l I s
LOSE [Lose] l o: z @

I am removing the entry “Los Angeles”.

Well, lots of problems so far. Let’s see what the next error message will be: Phoneme undefined: [New

I think I know which word is wrong. Is it New York? Of course it is:

NEW YORK [New York] n j u j O R k
NEW YORKS [New Yorks] n j u j O R k s

I will remove these two lines.

It seems that expressions that consist of two single words are causing problems.

Phoneme undefined: NAMEN
Remove the line:
IM NAMEN [im Namen] I m n a: m @ n

I think that there are more entries that will cause errors.

Phoneme undefined: ta:fl=

Removing the bold marked lines:

TAFEL [Tafel] ta:fl=
TAFELN [Tafeln] t a: f @ l n
TAFELN [Tafeln] ta:fl=n

Phoneme undefined: dE:@
Remove:
DERWEIL [derweil] dE:@ rwaIl

Phoneme undefined: bi:bl=n
Remove the bold marked lines:

BIBEL [Bibel] bi:bl=
BIBELN [Bibeln] b i: b @ l n
BIBELN [Bibeln] bi:bl=n

Phoneme undefined: bRA:S@
Remove the lines:
BRANCHE [Branche] bRA:S@
BRANCHEN [Branchen] bRA:Sn=

It is the same problem like above (Chance).

I will try it one more time to build the model.

Phoneme undefined: foeh:gl=n
I will remove the following lines:
VÖGEL [Vögel] foeh:gl=
VÖGELN [Vögeln] foeh:gl=n

OK, that is enough for now. Let’s stop here.

Import PLS dictionary to active vocabulary

Sunday, August 23rd, 2009

I imported the whole PLS dictionary /home/liberty/200905/voxDE20090209.xml into the active vocabulary. This feature had been added to simon a few weeks ago:

“simon can now import dictionaries to the active lexicon.”

You know that my next goal is to hit the 1000 words mark. 1000 words should be recognized by simon. At the moment, I have major recognition problems. simon isn’t very responsive. It recognizes e.g. the word “abnahmen”, but when I dictate other words (that are of course part of the active vocabulary and had been successfully trained by me), simon doesn’t react. Maybe it is something with the confidence score? Or maybe while playing with sam the speech model has been changed?

Well, the active vocabulary now contains more than 8000 words. When I dictate, simon now recognizes words that I never had trained. And of course, it recognizes the wrong words. So I will have to do figure out how to adjust the speech model.

For example, I could record with Audacity lots of single words (not utterances because I find it difficult to define an appropriate grammar), and choose the Export Multiple... function. I am using Audacity in combination with my external USB sound card. This sound card only works with 22050 hertz, not with 16000 hertz under Ubuntu. This is the reason why I am using my on board sound card when dictating into simon directly (= recognition) or when recording words with simon (= training).

It is a bit complicated to explain. I prefer Audacity for recording because it allows me to record lots of training samples in a short amount of time. So if I record with Audacity in 22050 hertz, I have to resample the wav files with sox. I tested the command from the Sphinx guide. The following command allowed me to transform a 22050 hertz file successfully into 16000 hertz:

$ sox de27-02.wav -r 16000 -c 1 -s de27-02-test.wav

With Audacity, I could record all 8000 words that are now in my active vocabulary. Let’s say in packages of 100 words. Two years ago, Audacity allowed me to export just about 30 wav files at a time otherwise the application would crash. I will have to test the current version of Audacity. Probably, this issue has been fixed.

My main concern is that the words of my dictionary are often very similar. Here is an example:

DUTZEND [Dutzend] d U ts @ n t
DUTZEND [Dutzend] d U ts n= t
DUTZENDE [Dutzende] d U ts @ n d @
DUTZENDE [Dutzende] d U ts n= d @
DUTZENDEN [Dutzenden] d U ts @ n d @ n
DUTZENDEN [Dutzenden] d U ts n= d @ n
DUTZENDEN [Dutzenden] d U ts n= d n=
DUTZENDS [Dutzends] d U ts n= ts

Eight entries that are very similar. I think that this is very hard to train successfully. I will have to find out how I can achive my 1000 words goal. Maybe I should reduce the size of the active vocabulary from 8000 words to 1000 words? The result would be that I could use a set of words that aren’t too similar, and thus I would get better recognition results.

And I could sort out short words. Short words are harder to recognize than long words. It is a trick to only train long words, and to leave out the short ones.

I am interested in the following function:

“simon can now import prompts files through the import training data wizard.”

This is a very interesting function for me. I have recorded lots of utterances. They could be imported into simon. But I have one problem: I didn’t define an appropriate grammar. I could use a grammar that uses just one category of words (e.g. all words are marked as noun, it doesn’t matter what they really are; they can be adverbs, adjectives, verbs, etc.). So this could be the way to go.

I think that the 1000 mark goal could be hit with the present vocabulary of 8000 words. Julius allows 20.000 words dictation. So 1000 words is a reasonable goal. When I have reached that goal, I will have to think about the following question: How can I hit the 10.000 words mark? First, I need a bigger lexicon. I don’t want to use BOMP since it would be necessary to write them an email. I prefer to stick to free dictionaries.

Another solution could be that I switch from the German PLS dictionary to the English Voxforge dictionary. I could do the testing in English.

Checked out revision 909

Saturday, August 22nd, 2009

I just checked out revision 909:

liberty@liberty-desktop:~/200908$ svn co https://speech2text.svn.sourceforge.net/svnroot/speech2text/
Checked out revision 909.
liberty@liberty-desktop:~/200908/speech2text/trunk$ ./build_ubuntu.sh

It seems that it is possible to save projects in a special sam format (suffix .sam).