Posts Tagged ‘cmudict.0.6d’

Importing the Voxforge dictionary

Monday, August 31st, 2009

I am now importing the Voxforge dictionary into simon from this location: /home/liberty/200908/sam/english/VoxForgeDict. I had downloaded it from here (VoxForge.tgz). It is in HTK format. What does the HTK format look like? Here is a small excerpt from the dictionary:

APPROACH [APPROACH] ax p r ow ch
APPROACHABLE [APPROACHABLE] ax p r ow ch ax b ax l
APPROACHED [APPROACHED] ax p r ow ch t
APPROACHES [APPROACHES] ax p r ow ch ax z
APPROACHES(2) [APPROACHES] ax p r ow ch ix z
APPROACHING [APPROACHING] ax p r ow ch ix ng
APPROBATION [APPROBATION] ae p r ax b ey sh ax n

The VoxForge dictionary contains about 130k words.

First, I imported the /home/liberty/200908/sam/english/cmudict.0.6d. It is in Sphinx format:

APPROACH AH0 P R OW1 CH
APPROACHABLE AH0 P R OW1 CH AH0 B AH0 L
APPROACHED AH0 P R OW1 CH T
APPROACHES AH0 P R OW1 CH AH0 Z
APPROACHES(2) AH0 P R OW1 CH IH0 Z
APPROACHING AH0 P R OW1 CH IH0 NG
APPROBATION AE2 P R AH0 B EY1 SH AH0 N

So you now know the difference between a dictionary that is stored in HTK format and one that is stored in Sphinx format. Both dictionaries – VoxForgeDict and cmudict.0.6d – contain each about 130k words. I don’t know whether they share the same phoneme set or not. My guess is that both lexicons are using CMU-40 but I don’t know, so I could be wrong!

I think that I will stick to VoxForgeDict because it is in HTK format.

So, what will be my next step? I want to train a few words with simon (words: this, is, a, different, approach). Then I will compile the speech model (= synchronize with ksimond). After that, I will try whether simon recognizes my voice.

If it works, I will try to make a test with sam. I want to test the sentence This is a different approach. with sam. This is the first sentence of my English files.

I have to get familiar with the whole training and testing process. In the long term, I want to use sam for model creation and model testing.

I just imported the lexicon VoxForgeDict into simon. Maybe I should define a grammar now? I just did that: Now I have a grammar with just one category: Unknown. I know that this isn’t sufficient for testing the whole sentence This is a different approach., but I will try to fix that later when the problem occurs.

I am adding now the word this:

this

I can’t record the word because I can’t restart simon. I think that I will have to get the current snapshot via svn.