Posts Tagged ‘sam’

sam: sorting should be improved

Sunday, August 29th, 2010

A few minutes ago, I tested section ‘xea’ against all. The overall recognition rate was about 95%.

recognition-rateTake a closer look at the right column: The word DIVERGIERST has a recognition rate of 100%. The word DIN has a recognition rate of 11.9569%. You can see that the sorting is not correct.

Btw, it would be interesting to export the recognition results into a text file. What about adding an Export test results button?

sam: test section ‘xea’ against all

Sunday, August 29th, 2010

At the moment, I am testing with sam the speech model xea (contains about 800 German words) against Ralf’s German speech model 0.1.6 (contains about 25000 German words). The file:///home/ubuntu/Documents/201008/sam-0.1.6/test-xea-against-all.sam has the following content:

/home/ubuntu/.kde/share/apps/simon/model/hmmdefs
/home/ubuntu/.kde/share/apps/simon/model/tiedlist
/home/ubuntu/.kde/share/apps/simon/model/model.dict
/home/ubuntu/.kde/share/apps/simon/model/model.dfa
/home/ubuntu/Documents/201008/wav-all
/home/ubuntu/Documents/201006/audacity/xea-folder/wav-xea
/home/ubuntu/Documents/201008/sam-0.1.6/lexicon
/home/ubuntu/Documents/201008/sam-0.1.6/model.grammar
/home/ubuntu/Documents/201008/sam-0.1.6/simple.voca
/home/ubuntu/Documents/201008/prompts-wav.txt
/home/ubuntu/Documents/201006/audacity/xea-folder/prompts-xea
/usr/share/kde4/apps/simon/model/tree1.hed
/usr/share/kde4/apps/simon/model/wav_config
16000
/usr/share/kde4/apps/simond/default.jconf
2

Question: what does the 2 in the last line mean?

By the way, I used the Serialize scenarios button (input file:///home/ubuntu/Documents/201007/german-speech-model-0.1.6/scenario-0.1.6.xml) to generate the following files:
file:///home/ubuntu/Documents/201008/sam-0.1.6/lexicon
file:///home/ubuntu/Documents/201008/sam-0.1.6/model.grammar
file:///home/ubuntu/Documents/201008/sam-0.1.6/simple.voca

I think that it is a good idea to test a subset (e.g. xea) against the superset (xaa+xab+xac+…+xpw).

sam handbook

Thursday, August 12th, 2010

During the last couples of minutes, I took a look into the sam handbook:

“You can sort the files by each column simply by clicking on the column header. This way it is very easy to find bad samples by sorting by recognition rate.”

This function worked partially on my computer (when using a previous development version of sam). I hope that this issue will be fixed. Some words were sorted beginning from lowest recognition rate to the highest recognition rate, but not all.

What about a function that displays all words that are below a specific recognition rate (e.g. “display all words that are below 90% recognition rate”)? I need a tool that helps me to sort out the bad audio files as efficiently as possible. I hope that a future version of sam will do the job.

HMM-definitions, Tiedlist, Dict, DFA

Saturday, April 24th, 2010

When I open Applications > Universal Access > sam > Input & output files, I see the following four options in the Output files area:

(a) HMM-definitions
(b) Tiedlist
(c) Dict
(d) DFA

When I am looking into simon, I can see the following options:

static-model

(a) Hmm Definition
(b) Tiedlist
(c) Macros
(d) Stats

I find it confusing:
- (a) and (b) obviously the same: sam can output the (a) Hmm Definition; simon can use the (a) Hmm Definition as input. sam can output (b) Tiedlist; simon can use the (b) Tiedlist that sam produced as input. So far so good.
- but what is with (c) and (d)? Is (c) Dict the same as (c) Macros? Is (d) DFA the same as (d) Stats?

Why does sam produce (c) Dict and (d) DFA as output files? Is it possible to use these sam output files as simon input files?

Some clarification would be helpful.

git pull origin master; Serialize scenarios

Tuesday, February 16th, 2010

A few minutes ago, I built the simon development version:

$ cd Documents/201001/speech2text
$ git pull origin master
$ ./build_ubuntu.sh

It is working.

I would like to know how sam > Input & output files > Serialize scenarios | Serialize prompts is working. Is there a tutorial available?

How can I export a sam speech model?

Wednesday, February 3rd, 2010

I tried sam again after I had problems a couple of days ago (I used the same paths; I just had to fix the paths to the test files). I used the Build model and the Test model button in conjunction with my German backup files (about 200 German words can be recognized with these files when everything is configured correctly). It worked.

My question is: how can I import the model that I have built/tested with sam into simon?

simon offers to Manage scenarios: Import and Export are offered. I tried the Export button. This created an XML file:

<!DOCTYPE scenario>
<scenario version=”1″ icon=”simon” name=”General” lastModified=”2010-02-03T14:06:50″>
<simonCompatibility>
<minimumVersion>
<version>0.2.82</version>
</minimumVersion>
<maximumVersion/>
</simonCompatibility>
<authors>
<author>
<name>Anybody</name>
<contact>no@mail</contact>
</author>
</authors>
<licence>GPL</licence>
<vocabulary/>
<grammar>
<structure>Unknown</structure>
</grammar>
<actions/>
<trainingtexts/>
</scenario>

I would like to be able to import my sam speech model into simon. How can I do this?

What can I import into simon?
- I can press the Import Dictionary button to import an active dictionary and/or a shadow dictionary.
- I can switch to the Grammar tab, press the Import button. I didn’t try this function yet. I am not too interested because currently I don’t need a grammar function (200 German words could be recognized without a grammar – only the terminal Unknown was used; 1000 German words should be possible without grammar, I hope). Of course, if I want to restore my German speech model (53 MB), it is necessary to restore the grammar, too. So this Import [Grammar] button might be useful.
- In the Training tab, I can press the Import Trainingsdata button. This should import the prompts file and the corresponding wav files (stored in the training.data folder).

When I take a look at sam > Static model, I can see fields for Base macros and Base stats. Where are these files from my own German 200 words speech model located? When I download the English acoustic model from Voxforge (HTK_AcousticModel-2010-02-03_16kHz_16bit_MFCC_O_D.tgz), I can see the following files:

macros-stats

1. macros – this file is probably usable with sam > Input & output files > Static model > Base macros.
2. stats – probably usable with sam > Input & output files > Static model > Base stats.

When I build my own German speech model with sam, where are these files – macros and stats – located? Where can I find them? I assume that I need them if I want to restore my speech model for the usage with simon, but I am not sure.

This is what I want: Restore my 200 German words speech model (my current problem). Then I want to add more words to this speech model. I am planning to add about 10 words per day on average to my German speech model. It should grow continously. And if something goes wrong, I want to be able to restore from my backup file because I don’t want to begin again and again from scratch.

The Manage scenarios > Import and Export buttons might be of help in the future.

I already said it earlier, and I say it again because it is important: the user doesn’t want to loose his own work (= wav recordings that were made with simon). It should be possible to backup (=export) and to restore (=import) all files that are necessary to build a working speech model.

For me, it is OK to specify each specific path like it is possible with sam. But in the end, it has to work with simon.

I want to fine tune my German speech model with sam. Especially, I want to sort out wav files that have a low recognition rate with sam. After I have fine-tuned my German speech model with sam, I want to use it with simon (=import it into simon). How is this possible?

sam: Couldn’t open prompts file

Tuesday, January 26th, 2010

I want to use my German backup folder with sam. I have to choose the specific paths to the specific backup files:

user-generated

I am using the following path for the jconf file (I had to look into this blog post): /usr/share/kde4/apps/simond/default.jconf

This is the current content of the file /home/am3msi/Documents/201001/model/20100126-try-to-restore-german.sam:

/home/am3msi/Documents/201001/model/hmmdefs
/home/am3msi/Documents/201001/model/tiedlist
/home/am3msi/Documents/201001/model/model.dict
/home/am3msi/Documents/201001/model/model.dfa
/home/am3msi/Documents/201001/model/training.data/
/usr/share/kde4/apps/simond/default.jconf
/home/am3msi/Documents/201001/model/lexicon
/home/am3msi/Documents/201001/model/model.grammar
/home/am3msi/Documents/201001/model/model.voca
/home/am3msi/Documents/201001/model/prompts
/home/am3msi/Documents/201001/model/training.data/
/home/am3msi/Documents/201001/model/tree1.hed
/home/am3msi/Documents/201001/model/wav_config
16000
/home/am3msi/Documents/201001/model/prompts

Now, I click the Build model button.

build-log

1. I pressed the Build model button.
2. The Build log indicates that it worked out. Great. I assume that the previously existing files
/home/am3msi/Documents/201001/model/hmmdefs
/home/am3msi/Documents/201001/model/tiedlist
/home/am3msi/Documents/201001/model/model.dict
/home/am3msi/Documents/201001/model/model.dfa

have been replaced by new ones (probably with the identical content).

Now I want to test the model. So I press the Test model button. sam displays an error message:

Couldn’t open prompts file for reading: /home/am3msi/Documents/201001/model/training.data/

Why is that? What went wrong? Let’s take a look at the paths to the test files:
/home/am3msi/Documents/201001/model/training.data/
/usr/share/kde4/apps/simond/default.jconf
/home/am3msi/Documents/201001/model/prompts

The paths are correct. I am trying the following: I copy the prompts file to the training.data folder. But this didn’t solve my problem.

My guess is that there is a bug with sam. At least, it is possible to build a speech model with sam (from my German backup files). That is a good start. That means that my German wav recordings, my dictionary, my prompts aren’t lost.

My next step will be to take a closer look at simon. I will try to use my German backup files with simon. They worked with sam (only the Test model function failed, but the Build model function obviously worked). And I hope that they will work with simon, too.

Find bad wav files with sam

Tuesday, December 29th, 2009

A. I am testing sam. I am not sure about the jconf file (error message: “Couldn't open julius jconf file: "".“. Where is it located? When I am searching my computer, I find adin.jconf, default.jconf, Sample.jconf (each configuration file is located at a different location). Which one should I choose?

1. I tested sam with Sample.jconf, and it seemed to work. But to be honest: Isn’t this just a dummy file? Every line begins with an # (number sign).

2. adin.jconf is very short, and everything is commented out.

3. default.jconf is probably the correct choice because some lines are valid, e.g.:

[...]
-h hmmdefs
[...]
-hlist tiedlist
[...]
-penalty1 5.0 # first pass
-penalty2 20.0 # second pass
[...]

Location of this file: /usr/share/kde4/apps/simond/default.jconf
I think that I will use sam with this file.

B. I assume that sam > Build model is the same like simon > Synchronize because the button is the same (green circular arrow).

This means that I can use simon for recording new words, and synchronize the speech model. In my last video, all words (more than 200 words) were recognized correctly. How can I eliminate wav files that are not so good? I need an efficient way to fix future problems. sam seems to fill the gap. With simon, I can record words, and synchronize. With sam, I can find out which words are below 100% confidence score. These words can be edited with sam (very nice feature).

So, simon is good for the main work. Fixing bad wav files can be done with sam. E.g. I found a wav file with a confidence score of about 89%. I edited this wav file with sam (sam offers the option to re-record a wav file). I will see whether this improves the confidence score after rebuilding (sam > Build model), and testing the model (sam > Test model).

C. It seems that sam is working without ksimond.

D. I don’t know yet how to handle wav files with multiple words. With sam, the confidence score of these wav files is 0 %. I added with simon the grammar structure “Unknown Unknown Unknown Unknown“, but it still doesn’t work out. But now, I found one sample which had a confidence score of 50%:

Result 6 of 10
=====================
Sentence: verlangsamende verlangsamendem verlangsamenden verlangsamendes
SAMPA:
Raw SAMPA:
Average Confidence: 50
Confidence Scores: 1.72309e-12 100 4.8642e-05 100

So I can say: it seems to work when I am adding the sentence structure “Unknown Unknown Unknown Unknown“.

E. The word deutschfeindlichen has a confidence score of 90.0133 %:

Result 1 of 3
=====================
Sentence: deutschfeindlichen
SAMPA:
Raw SAMPA:
Average Confidence: 90.0133
Confidence Scores: 90.0133

Result 2 of 3
=====================
Sentence: deutschfeindlichem
SAMPA:
Raw SAMPA:
Average Confidence: 9.98674
Confidence Scores: 9.98674

Result 3 of 3
=====================
Sentence: deutschfeindliche
SAMPA:
Raw SAMPA:
Average Confidence: 8.29019e-17
Confidence Scores: 8.29019e-17

So the alternatives deutschfeindlichem and deutschfeindliche have a much lower confidence score. This is fine. I can see how good it works internally. Because in the video (see link above), you can see 100 % perfect results. But internally, it is just about 90 % for deutschfeindlichen.

Here is another example Wortbreite:

Result 1 of 2
=====================
Sentence: Wortbreite
SAMPA:
Raw SAMPA:
Average Confidence: 0.0408209
Confidence Scores: 0.0408209

Result 2 of 2
=====================
Sentence: Wortbreiten
SAMPA:
Raw SAMPA:
Average Confidence: 99.9592
Confidence Scores: 99.9592

It should have recognized Wortbreite, but it recognized with more than 99 % confidence score the word Wortbreiten. So this match is wrong.

F. Conclusion: I hope you got an impression of sam. sam seems to be a great tool for speech model development.

“sam can now test models”

Wednesday, September 9th, 2009

Interesting: “sam can now test models”. This means that I can get an acoustic model from VoxForge, and test this model with sam. Well, I want to build my own acoustic models with just my own voice. I hope that I can use sam for the building / testing process.

By the way, currently I am building an additional German pronunciation dictionary that someday will contain the missing words from my German audio files. When I have the missing words in the dictionary, I hope that the testing process with sam will give me good results.

sam: Could not initialize recognition

Friday, September 4th, 2009

I want to test a speech model with sam:

test

1. I built the speech model. It was a lot of work, because I had to fix a lot of lexicon issues. But in the end it was successful.
2. I pressed the button Test model.
3. An error message occurred. I don’t know what went wrong. The paths should be correct:
a. I have specified a file for the test prompts: file:///home/liberty/200908/sam/michverstanden/test_prompts
b. And I have a specified a folder for the wav test prompts:
/home/liberty/200908/sam/michverstanden/test

michverstanden.sam

Thursday, September 3rd, 2009

This is what I did today: I imported the German PLS dictionary into simon, and created an additional PLS dictionary. Of course, I imported this additional dictionary into simon, too.

I copied /home/liberty/.kde/share/apps/simon/model/lexicon to
/home/liberty/200908/sam/michverstanden/lexicon. Then, I copied /home/liberty/.kde/share/apps/simon/model/model.voca to /home/liberty/200908/sam/michverstanden/model.voca. After that, I configured sam with the paramaters that are stored in the file /home/liberty/200908/sam/michverstanden/michverstanden.sam

I want to build a speech model using the German 01 prompts. I have these prompts in 16kHz / 16 bit from Voxforge: ralfherzog-20070816_de1.tgz. I made some modifications to the PROMPTS file (Ä instead of ä; Ö instead of ö; Ü instead of ü, SS instead of ß).

I tried to build the model with sam. But an error message occured:

ampersand

I don’t know how to solve this problem. Well, I have made some experiences with the phoneme & in the past:

1. Ampersand (g & N @) could be compiled
2. model.voca: changing verb to noun

Obviously, the phoneme & has to be defined. But how could that be achieved? From my point of view, we could omit this phoneme, and replace it with the phoneme E. This means that I could try to solve the problem by exchanging the phoneme & with the phoneme E in the following files with gedit:

file:///home/liberty/200908/sam/michverstanden/lexicon
file:///home/liberty/200908/sam/michverstanden/model.voca

Maybe I will try that later.

Edit: I just replaced the phoneme & with E in the files lexicon and model.voca (same path as before). The I tried to build the model with sam. Now sam displays the following message:

Phoneme undefined: Z

Well, I think that I have to train these phonemes. So it would have been sufficient to train the phoneme &. Probably, the German 01 prompts don’t contain the phonemes Z and &. So I should include prompts that contain these phonemes. Example for the phoneme Z:

IMAGE [Image] I m I Z

I think that this entry should be fixed (to I m I d Z). But not now.

I think that I will insert two single words that contain the phonemes Z and &. And I don’t have to forget to add these entries to the prompts file.

Edit September 4, 2009: I recorded the wav file job-gaenge.wav with Audacity. Then I applied the following command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox job-gaenge.wav -r 16000 -c 1 -s job_gaenge.wav

Now I have the file job_gaenge.wav in my training folder. It is now necessary to modify the prompts file:

file:///home/liberty/200908/sam/michverstanden/prompts

The next step would be to build the speech model with sam. I will do that now. I just started sam. I have to open the file /home/liberty/200908/sam/michverstanden/michverstanden.sam. When trying to build the model, the following error message occured:

Phoneme undefined: y

OK, I will have to define this phoneme, too. Now I will apply the following command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox ungluecks_.wav -r 16000 -c 1 -s ungluecks.wav

What is the problem now? The following error message appeared:

Error while coding the samples!

Please check the path to HCopy (/usr/local/bin/HCopy) and the wav config (/home/liberty/200908/sam/michverstanden/wav_config)

OK, I understand: I made a small mistake. I had added to the prompts file the following line:

ungluecks.wav UNGLÜCKS

This was wrong. The following line is the correct one:

ungluecks UNGLÜCKS

A small mistake, and it doesn’t work. And again the same error message:

Phoneme undefined: y

I understand my mistake. Take a look into the lexicon:

UNGLÜCKS [Unglücks] U n g l Y k s

The Y and the y are different phonemes. I will train the following entry:

ÄGYPTEN [Ägypten] E g y p t n=

I don’t know why we are distinguishing between the Y and the y. The reason can be found in the Wiktionary:

[y] U+0079 nur in Fremdwörtern: Physik /[fyˈsɪk]/
[ʏ] U+028F dünn /[dʏn]/, lüften /[ˈlʏftn̩]/, Symbol /[zʏmˈboːl]/

When I submit words for the dictionary acquisition project, I try to follow this rule. I don’t understand the sense of this rule, but it is a rule. We will have to discuss this issue. I applied the following command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox aegypten-aegypten.wav -r 16000 -c 1 -s aegypten_aegypten.wav

Another problem occurs:

Phoneme undefined: E:

I will add the following word:

ANSCHLÄGE [Anschläge] a n S l E: g @

I am executing the command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox anschlaege-anschlaege.wav -r 16000 -c 1 -s anschlaege_anschlaege.wav

OK, another phoneme is missing:

Phoneme undefined: OY

I will take the following entry:

MEHRWERTSTEUER [Mehrwertsteuer] m e: @ r v e: @ r t S t OY @ r

I am executing the command:

liberty@liberty-desktop:~/200908/sam/michverstanden/training.data$ sox mehrwertsteuer-mehrwertsteuer.wav -r 16000 -c 1 -s mehrwertsteuer_mehrwertsteuer.wav

OK, another problem:

Phoneme undefined: an

There is obviously an error in the lexicon:

ANFÄNGE [Anfänge] an fEN@

This is the corresponding entry in the PLS dictionary that had been imported:

 <lexeme>
  <grapheme>Anfänge</grapheme>
  <phoneme>an.fɛŋə</phoneme>
 </lexeme>

I will delete this entry from the following lexicon:

file:///home/liberty/200908/sam/michverstanden/lexicon

I might have to do that again when I replace this lexicon with a new one. So this is a good reminder for me.

OK, next problem: Phoneme undefined: dUNkl=

I think I know what the problem is. Next problem: Phoneme undefined: UnmItl=ba:rstn=

I have to delete the following lines:

UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=

Next problem: (more…)

Importing the Voxforge dictionary

Monday, August 31st, 2009

I am now importing the Voxforge dictionary into simon from this location: /home/liberty/200908/sam/english/VoxForgeDict. I had downloaded it from here (VoxForge.tgz). It is in HTK format. What does the HTK format look like? Here is a small excerpt from the dictionary:

APPROACH [APPROACH] ax p r ow ch
APPROACHABLE [APPROACHABLE] ax p r ow ch ax b ax l
APPROACHED [APPROACHED] ax p r ow ch t
APPROACHES [APPROACHES] ax p r ow ch ax z
APPROACHES(2) [APPROACHES] ax p r ow ch ix z
APPROACHING [APPROACHING] ax p r ow ch ix ng
APPROBATION [APPROBATION] ae p r ax b ey sh ax n

The VoxForge dictionary contains about 130k words.

First, I imported the /home/liberty/200908/sam/english/cmudict.0.6d. It is in Sphinx format:

APPROACH AH0 P R OW1 CH
APPROACHABLE AH0 P R OW1 CH AH0 B AH0 L
APPROACHED AH0 P R OW1 CH T
APPROACHES AH0 P R OW1 CH AH0 Z
APPROACHES(2) AH0 P R OW1 CH IH0 Z
APPROACHING AH0 P R OW1 CH IH0 NG
APPROBATION AE2 P R AH0 B EY1 SH AH0 N

So you now know the difference between a dictionary that is stored in HTK format and one that is stored in Sphinx format. Both dictionaries – VoxForgeDict and cmudict.0.6d – contain each about 130k words. I don’t know whether they share the same phoneme set or not. My guess is that both lexicons are using CMU-40 but I don’t know, so I could be wrong!

I think that I will stick to VoxForgeDict because it is in HTK format.

So, what will be my next step? I want to train a few words with simon (words: this, is, a, different, approach). Then I will compile the speech model (= synchronize with ksimond). After that, I will try whether simon recognizes my voice.

If it works, I will try to make a test with sam. I want to test the sentence This is a different approach. with sam. This is the first sentence of my English files.

I have to get familiar with the whole training and testing process. In the long term, I want to use sam for model creation and model testing.

I just imported the lexicon VoxForgeDict into simon. Maybe I should define a grammar now? I just did that: Now I have a grammar with just one category: Unknown. I know that this isn’t sufficient for testing the whole sentence This is a different approach., but I will try to fix that later when the problem occurs.

I am adding now the word this:

this

I can’t record the word because I can’t restart simon. I think that I will have to get the current snapshot via svn.

Experimenting with sam

Monday, August 24th, 2009

I am experimenting with sam. Here is the content of the file german.sam:

/home/liberty/200908/sam/german/hmmdefs
/home/liberty/200908/sam/german/tiedlist
/home/liberty/200908/sam/german/model.dict
/home/liberty/200908/sam/german/model.dfa
/home/liberty/200908/sam/german/training.data/
/home/liberty/200908/sam/german/training.data/
/home/liberty/200908/sam/german/lexicon
/home/liberty/200908/sam/german/model.grammar
/home/liberty/200908/sam/german/model.voca
/home/liberty/200908/sam/german/prompts
/home/liberty/200908/sam/german/prompts
/home/liberty/200908/sam/german/tree1.hed
/home/liberty/200908/sam/german/wav_config
16000
/home/liberty/200908/sam/german/julius.jconf

Obviously, I have to do some adjustments to the lexicon, e.g. the error message Phoneme undefined: an occured. The solution is to delete the line

ANFÄNGE [Anfänge] an fEN@

in the file:///home/liberty/200908/sam/german/lexicon. After I have deleted this line, I save the file. Then I click Build model again. I have to wait a few moments.

And now, the message Phoneme undefined: pf appears. I will record the word Kopfes with Audacity (22050 hertz). Then I run the command liberty@liberty-desktop:~/200908/sam/german$ sox kopfess.wav -r 16000 -c 1 -s kopfes.wav. Then I move the file kopfes.wav to the folder /home/liberty/200908/sam/german/training.data. Now it is necessary to add the following line to the prompts file:

kopfes KOPFES

After saving the prompts file, I will press the Build model button again. Now the error message Phoneme undefined: dUNkl= appears. I have to delete the following lines that are marked in bold:

DUNKEL [dunkel] dUNkl=
DUNKELSTE [dunkelste] d U N k @ l s t @
DUNKELSTE [dunkelste] dUNkl=st@
DUNKELSTEM [dunkelstem] d U N k @ l s t @ m
DUNKELSTEN [dunkelsten] d U N k @ l s t n=
DUNKELSTEN [dunkelsten] dUNkl=stn=

And this is the next error message: Phoneme undefined: UnmItl=ba:rstn=. I will delete the following lines:

UNMITTELBAR [unmittelbar] UnmItl=ba:r
UNMITTELBARE [unmittelbare] UnmItl=ba:r@
UNMITTELBAREM [unmittelbarem] UnmItl=ba:r@m
UNMITTELBAREN [unmittelbaren] UnmItl=ba:r@n
UNMITTELBARER [unmittelbarer] UnmItl=ba:r@ r
UNMITTELBARERE [unmittelbarere] UnmItl=ba:r@r@
UNMITTELBARES [unmittelbares] UnmItl=ba:r@s
UNMITTELBARSTE [unmittelbarste] UnmItl=ba:rst@
UNMITTELBARSTEN [unmittelbarsten] UnmItl=ba:rstn=

It seems that this is a good way to find out what went wrong during the import of the PLS dictionary. There are some inconsistencies that have to be fixed.

Next error message: Phoneme undefined: tUnl=. I have to delete the lines that are emphasized:

TUNNEL [Tunnel] tUnl=
TUNNELN [Tunneln] t U n @ l n
TUNNELN [Tunneln] tUnl=n=
TUNNELS [Tunnels] t U n @ l s
TUNNELS [Tunnels] tUnl=s

Error message: Phoneme undefined: SA:s@n
Deleting the lines:

CHANCE [Chance] SA:s@
CHANCEN [Chancen] SA:s@n

Maybe there was a french vowel in the PLS dictionary? I will take a look into it. Yes:

Chance ʃɑ̃ːsə
Chancen ʃɑ̃ːsən

Ugly, but I think that in the long term we might need the french vowels. Or we use similar german vowels? We could use e.g. ʃɔsən. Not very good, but it could be sufficient.

Error message: Phoneme undefined: mIta:%baIt@
I have to delete the following line:

MITARBEITER [Mitarbeiter] mIta:%baIt@ r

The corresponding entry in the PLS dictionary:

Mitarbeiter mɪtʔaːˌbaɪ̯tɐ

Normally, the PLS dictionary doesn’t contain any stress information. Why not? Because it is easier. But obviously, this entry contains a stress information.

Error message: Phoneme undefined: fIrtl=fi:nal@s

Removing the bold marked lines:

VIERTELFINALE [Viertelfinale] fIrtl=fi:nal@
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l @ n
VIERTELFINALEN [Viertelfinalen] f I r t @ l f i: n a l n=
VIERTELFINALEN [Viertelfinalen] fIrtl=fi:naln=
VIERTELFINALES [Viertelfinales] f I r t @ l f i: n a l @ s
VIERTELFINALES [Viertelfinales] fIrtl=fi:nal@s

Error message: Phoneme undefined: arti:kl=

I have to remove the following lines:

ARTIKEL [Artikel] arti:kl=
ARTIKELN [Artikeln] arti:kl=n

Error message: Phoneme undefined: aIntsl=handl=s

I have to remove the bold marked lines:

EINZELHANDEL [Einzelhandel] aInts@lhandl=
EINZELHANDEL [Einzelhandel] aIntsl=handl=
EINZELHANDELS [Einzelhandels] aI n ts @ l h a n d @ l s
EINZELHANDELS [Einzelhandels] aIntsl=handl=s

Error message: Phoneme undefined: fo:gl=s
Removing the lines:

VOGEL [Vogel] fo:gl=
VOGELS [Vogels] fo:gl=s

Message: Phoneme undefined: [Los

I think the problem is the word Los Angeles:

LOS [Los] l o: s
LOS ANGELES [Los Angeles] l O s & n d Z @ l I s
LOSE [Lose] l o: z @

I am removing the entry “Los Angeles”.

Well, lots of problems so far. Let’s see what the next error message will be: Phoneme undefined: [New

I think I know which word is wrong. Is it New York? Of course it is:

NEW YORK [New York] n j u j O R k
NEW YORKS [New Yorks] n j u j O R k s

I will remove these two lines.

It seems that expressions that consist of two single words are causing problems.

Phoneme undefined: NAMEN
Remove the line:
IM NAMEN [im Namen] I m n a: m @ n

I think that there are more entries that will cause errors.

Phoneme undefined: ta:fl=

Removing the bold marked lines:

TAFEL [Tafel] ta:fl=
TAFELN [Tafeln] t a: f @ l n
TAFELN [Tafeln] ta:fl=n

Phoneme undefined: dE:@
Remove:
DERWEIL [derweil] dE:@ rwaIl

Phoneme undefined: bi:bl=n
Remove the bold marked lines:

BIBEL [Bibel] bi:bl=
BIBELN [Bibeln] b i: b @ l n
BIBELN [Bibeln] bi:bl=n

Phoneme undefined: bRA:S@
Remove the lines:
BRANCHE [Branche] bRA:S@
BRANCHEN [Branchen] bRA:Sn=

It is the same problem like above (Chance).

I will try it one more time to build the model.

Phoneme undefined: foeh:gl=n
I will remove the following lines:
VÖGEL [Vögel] foeh:gl=
VÖGELN [Vögeln] foeh:gl=n

OK, that is enough for now. Let’s stop here.

“JEDER HAT EIN STÜCK BEKOMMEN”

Tuesday, August 11th, 2009

I tested with sam a German sentence that is part of the Voxforge collection:

1. I took a look into http://script.blau.in/german/27/prompts.xml. Then I downloaded ralfherzog-20071213-de27.tgz (6.2 MB). It contains FLAC files in 16 kHz / 16bit. I don’t know how I can convert FLAC files (48 kHz) into wav files (16 kHz). So I take the Voxforge version with 16 kHz.

2. After unpacking, I converted file:///home/liberty/200908/ralfherzog-20071213-de27/flac/de27-60.flac with SoundConverter into
file:///home/liberty/.kde/share/apps/simon/model/samtestwav.data/de27-60.wav.

3. I changed the test prompts base path to /home/liberty/.kde/share/apps/simon/model/samtestwav.data.

4. I pressed the Test model button. This is the test log (emphasis by me):

/usr/bin/sox -2 -s -L /home/liberty/.kde/share/apps/simon/model/samtestwav.data/de27-60.wav /home/liberty/.kde/tmp-liberty-desktop/sam/internalsamuser/test/samples/de27-60.wav
Preperation
Recoding audio...
Generating MLF...
Recognizing...
Prompts entry: JEDER HAT EIN STÜCK BEKOMMEN
Received recognition result for: /home/liberty/.kde/tmp-liberty-desktop/sam/internalsamuser/test/samples/de27-60.wav: Therapien
Analyzing recognition results...
Finished

5. All words (jeder, hat, ein, Stück, bekommen) are part of my simon shadow dictionary. But they aren’t part of the active lexicon yet.

6. The recognition rate of the individual words and of the whole sentence was 0. Maybe I should train these words with simon first, and then try again.

I changed the field Test-Prompts

Tuesday, August 11th, 2009

After reading these instructions, I created the file /home/liberty/.kde/share/apps/simon/model/prompts_test. It has the following content:

Hamburgern_S2_2009-07-19_22-44-21 HAMBURGERN
aufgenommen_S2_2009-07-25_11-36-59 AUFGENOMMEN
Fraktionen_S1_2009-07-19_19-33-01 FRAKTIONEN
breiteren_S2_2009-07-24_15-23-54 BREITEREN
Expansionen_S2_2009-07-19_18-05-20 EXPANSIONEN
Hannovers_S2_2009-07-19_23-07-09 HANNOVERS
Hannovers_S1_2009-07-19_23-07-02 HANNOVERS

Then I changed the input file for the field Test-Prompts:

test-prompts

The path for the wav prompts and the wav test prompts is the same.

Here is a small excerpt from the log (emphasis by me):

/usr/bin/sox -2 -s -L /home/liberty/.kde/share/apps/simon/model/training.data//Hamburgern_S2_2009-07-19_22-44-21.wav /home/liberty/.kde/tmp-liberty-desktop/sam/internalsamuser/test/samples/Hamburgern_S2_2009-07-19_22-44-21.wav

The wav files that are mentioned by prompts_test are being copied to a sub-folder of internalsamuser. This is a temporary folder.

And this is the test result:

sentences

sam couldn’t open julius jconf file

Tuesday, August 11th, 2009

I want to figure out how sam works.

jconf

1. I pressed Test model.
2. About 30 seconds later the error message Couldn't open julius jconf file: "". appeared.
3. I guess that I have to define a path for the JConf file. I think that I will give the following file a try: /home/liberty/200908/speech2text/trunk/simond/default.jconf.

By the way, forget about the red question mark in the screen shot of the previous post. The paths are as follows:
/usr/share/kde4/apps/simon/model/tree.hed
/usr/share/kde4/apps/simon/model/wav_config

I am getting the following error message:

recognizing

I get this error message even if I activate/connect simon/ksimond. I guess I have to use a different default.jconf file (located at a different location) because “relative paths must be relative to THIS FILE“.

I am giving the following location a try: /home/liberty/.kde/share/apps/simond/models/a/active/julius.jconf. It seems to work:

test-log

And this could be the overall recognition rate. But it isn’t. Just about 49 % were recognized correctly. I think that I will listen to some of the wav files. Maybe I will have to throw away some of the training samples?

sam: path of the input files

Monday, August 10th, 2009

Let’s take a look into the Voxforge forum (emphasis by me):

“replacing the model files in ~/.kde/share/apps/simond/models//active with the voxforge model files will work (I actually already tested that a while ago).”

Let’s compare the path with the paths that are proposed by sam:

sam

Most of the paths are clear (marked in green, blue, brown). But where are the input files located that I marked with red color (tree.hed, wav_config, JConf)?

I hope that there will be a sam handbook, too.

Let’s compare the path proposed by sam home/liberty/.kde/share/apps/simon/model with the path mentioned in the Voxforge post quoted in the beginning of this article: simond is different from simon.

And what is with “the option of automatically downloading a recent voxforge snapshot“? Obviously, at the moment it isn’t possible to download directly a speech model from Voxforge.

sam: test prompts

Friday, August 7th, 2009

I checked out revision 891:

liberty@liberty-desktop:~/200907$ svn co https://speech2text.svn.sourceforge.net/svnroot/speech2text/

Then I tried to build simon / sam:

liberty@liberty-desktop:~/200907/speech2text/trunk$ ./build_ubuntu.sh

During the compilation, an error message appeared. I will try again later, I don’t know what went wrong.

I think that sam will be very useful for testing speech models:

sam-test

I opened the file /home/liberty/200907/speech2text/trunk/sam/src/main.ui with Qt Creator. You can see that it is possible to define a path for test prompts (text file) / test prompts base path (corresponding wav files). I will try that with German Voxforge prompts. My goal is to test up to about 100 prompts (utterances) at a time.

simon acoustic modeler

Thursday, July 30th, 2009

I checked out revision 886. With Qt Creator, I took a look at the file /home/liberty/200907/speech2text/trunk/sam/src/main.ui.

sam-main

I opened the location /home/liberty/.kde/share/apps/simon/model (1). Then I tried to compare the different files (2-10). Maybe it is a different path? I don’t know.

Would it be possible to download julius-voxforge_0.1.1~build726.orig.tar.gz (found at packages.ubuntu.com), and use the files that are contained in this package in conjunction with sam?

I think that the temporary path would be on my computer /home/liberty/.kde/tmp-liberty-desktop/simond/a/compile. Every time when I synchronize the speech model with simon, most of the files that are in this folder are being updated. The file tree.hed can be found at this location, too.

A wav_config file can be found here: /home/liberty/.kde/share/apps/simond/models/a/src/2009-07-26_13-12-47/wav_config. But this path is temporary.

I hope that some day I will be able to prepare a German language package with the simon acoustic modeler that could be imported into a future version of simon.