Posts Tagged ‘Wav’

sam: test prompts

Friday, August 7th, 2009

I checked out revision 891:

liberty@liberty-desktop:~/200907$ svn co https://speech2text.svn.sourceforge.net/svnroot/speech2text/

Then I tried to build simon / sam:

liberty@liberty-desktop:~/200907/speech2text/trunk$ ./build_ubuntu.sh

During the compilation, an error message appeared. I will try again later, I don’t know what went wrong.

I think that sam will be very useful for testing speech models:

sam-test

I opened the file /home/liberty/200907/speech2text/trunk/sam/src/main.ui with Qt Creator. You can see that it is possible to define a path for test prompts (text file) / test prompts base path (corresponding wav files). I will try that with German Voxforge prompts. My goal is to test up to about 100 prompts (utterances) at a time.

Moving the wav files

Friday, July 24th, 2009

I want to move the wav files from the current path to a different location.

hitachi-wav1

1. Currently, the wav files are stored in the folder /media/Hitachi/simon-xp-wav. I have to mount the Hitachi hard drive before simon is able to store a recorded wav file on this hard drive. I want to avoid that.

2. Let’s take a look at some of the wav files. They are ordered alphabetically.

3. A few hours ago, I made a test. I added the word bətsiːʊŋsvaɪ̯zə to the active vocabulary. I used IPA symbols instead of the German word. The result was that a file with the name bətsiːʊŋsvaɪ̯zə_S1_2009-07-24_04-31-09.wav was being stored. simon was able to compile the speech model, and when I dictated bətsiːʊŋsvaɪ̯zə, simon didn’t write bətsiːʊŋsvaɪ̯zə as I intended. Instead, it was written btsisvaz (or something like that, I can’t remember exactly). This result is not usable, so I deleted the entry bətsiːʊŋsvaɪ̯zə from my active vocabulary. Even though, this word was not deleted from the folder /media/Hitachi/simon-xp-wav, maybe both samples slipped through the deletion mechanism?

4. You can see the files de27-02.wav and de27-03.wav. These files were not recorded with simon. I imported them from Voxforge. Both files are 16 kHz. They influence the speech model, the import was successful. I want to add more wav files from Voxforge in the future.

5. The size of each wav file is about 72 KB – 344 KB.

Now, I have copied the wav files to the folder /home/liberty/.kde/share/apps/simon/model/training.data. I will delete the folder /media/Hitachi/simon-xp-wav later.

I changed the path to the training samples to the new location. Now I will connect, and then synchronize. It seems to work.

Adding prompts from Voxforge

Tuesday, July 21st, 2009

I just added two lines to the file /home/liberty/.kde/share/apps/simon/model/prompts:

Herzog_S2_2009-07-19_23-45-26 HERZOG
organisiert_S1_2009-07-20_00-21-32 ORGANISIERT
Flaschen_S2_2009-07-19_18-59-50 FLASCHEN
de27-02 DAS HAUS IST NEU GEBAUT WORDEN
de27-03 DAS WETTER IST SEHR SCHLECHT

Now, I will have to add the corresponding FLAC/wav files. I have to do a conversion from FLAC to wav:

flac-wav

1. I have opened the folder /home/liberty/200907/ralfherzog-20071213-de27/flac.
2. The audio files de27-02.flac and de27-03.flac have to be converted into the wav format.
3. The suffix will be .wav.
4. I have to select the wav format.

move-wav

5. I moved the two wav files to /media/Hitachi/simon-xp-wav.
6. You can see that this folder contains lots of wav files that have been recorded with simon.

And now I have started simon and ksimond. Let’s see what happens when I press the Synchronize button. There was no error message. But the word Wetter isn’t included in the active word list (it is part of the shadow dictionary).

I guess that I have forgotten to adjust the TrainingDate value.

One question remains open: how would simon know which pronunciation to choose if there were several pronunciations available? It is possible that the answer has been given in a comment on this blog, but I can’t remember the details at the moment.

How to import Voxforge models / prompts

Tuesday, July 21st, 2009

There is some interesting information in the Voxforge forum (subsequent quotes are from the Voxforge forum unless not marked otherwise):

“I am assuming that since it uses HTK format acoustic models, you should be able to just replace the hmmdefs, macros and tiedlist files with VoxForge’s versions of these files.”

I have the same thought. But I need to know about the exact details. Here they are:

“replacing the model files in ~/.kde/share/apps/simond/models/<your user>/active with the voxforge model files will work”

It would be possible to add wav files:

“You can of course add samples to them but make sure you place them in the configured samples folder so they will be found during the compiling of the model.”

So this means that I could add wav files from Voxforge to the following location on my computer:

training-path

And I would have to adjust the prompts file:

“The prompts file is located at ~/.kde/share/apps/simon/model/prompts.”

Here are a few lines from my prompts file (location: /home/liberty/.kde/share/apps/simon/model/prompts):

Computer_S2_2009-07-18_16-53-22 COMPUTER
Grenzen_S2_2009-07-19_20-41-13 GRENZEN
Geschenken_S1_2009-07-19_20-04-16 GESCHENKEN
Technologie_S1_2009-07-19_10-58-49 TECHNOLOGIE
Gewichten_S2_2009-07-19_20-38-22 GEWICHTEN

And I would have

“to manually update the “TrainingsDate” value to the current date/time in the file: ~/.kde/share/apps/simon/model/modelsrcrc”

On my computer, the file /home/liberty/.kde/share/apps/simon/model/modelsrcrc has the following content:

GrammarDate=2009,7,13,16,43,50
LanguageDescriptionDate=2009,5,5,19,16,34
TrainingDate=2009,7,20,22,51,26
WordListDate=2009,7,20,22,49,38

Let’s take a closer look at TrainingDate:

year,month,day, = 2009,7,20,
hour (24 hour format),minute,second = 22,51,26

I think that it would be best if I tried to import a few German wav files from Voxforge (16 kHz) into /media/Hitachi/simon-xp-wav. I have just downloaded ralfherzog-20071213-de27.tgz (6.2 MB). It contains FLAC files. They should be converted to wav files before/while inserting them into the folder /media/Hitachi/simon-xp-wav. And I would have to add the content of PROMPTS (from ralfherzog-20071213-de27.tgz) to the file /home/liberty/.kde/share/apps/simon/model/prompts.

I think that the concept is as follows: simon manages the wav files, the prompts, and TrainingDate. It ‘gives’ them (via TCP/IP) to simond which generates hmmdefs and tiedlist.

My goal is to use simon/simond for the model generation. And I want to import wav files with the corresponding prompts from Voxforge.

I deleted the word ‘Taiwans’

Monday, July 20th, 2009

I just removed the word Taiwans from my active vocabulary. The word had been trained 10 times. My current strategy is (active vocabulary contains 160 entries):

- When a word is recognized wrong, I will train it two more times.
- When a word had been trained 8 times, I will train it one more time.
- When a word had been trained 9 times, I will train it once again.
- When a word had been trained 10 times, and a recognition error occurs, I will remove it from the active vocabulary.

I don’t know whether this is the best strategy. But I don’t listen to the recordings I have made. Of course, I keep this advice in mind.

1. Let’s take a look into /media/Hitachi/simon-xp-wav:

taiwans

2. You can see the wav file Taiwans_Taiwans_Gehirns_Gehirns_Gehältern_Gehältern_Geistern_Geistern_S1_2009-07-19_20-20-12.wav. It contains two words (red area: Taiwans_Taiwans) that aren’t part of my active vocabulary any more. Obviously, this wav file wasn’t deleted by simon. Why not? Is the second part of this wav file still being used for training (green area: Gehirns_Gehirns_Gehältern_Gehältern_Geistern_Geistern)?

3. I forgot to insert this number, sorry. *smile*

4. Take a look into /home/liberty/.kde/tmp-liberty-desktop/simond/a/compile/mfcs.
5. You can see 8 mfc files that contain the word Taiwans.
6. Six of these eight mfc files contain words that are probably still in use. After compiling the speech model again, all 8 mfc files are still available. Do they influence the speech model?

Drag and drop a simon wav file into Totem

Tuesday, May 26th, 2009

Recently, I have recorded some wav files with simon. I think that they are in 16 kHz format. It is possible to drag and drop such a wav file into Audacity (and play it with Audacity). But when I try to drop the wav file into Totem, I get an error message:

encountered

What is the reason for this? Does the file Aufforderung_S1_2009-05-25_11-17-48.wav contain something that isn’t compatible with Totem / GStreamer?

I would like to know this because I am thinking about whether I should convert my 48 kHz FLAC files with sox into 16 kHz WAV format. I could drop these files into the folder in that simon stores the wav files (of course, I would have to rename the files so that simon knows what the utterance/transcription of the wav file is).