Posts Tagged ‘dictation’

Ralf’s German speech model 0.1.1

Monday, July 19th, 2010

Ralf’s German speech model 0.1.1 is available for download (3.7 MB, GPLv3).

With this speech model, you can dictate 5000 different German words.

Paths to the files on my computer:
file:///tmp/kde-ubuntudjJ1e3/simond/default/compile/hmm24/hmmdefs
file:///tmp/kde-ubuntudjJ1e3/simond/default/compile/tiedlist
file:///tmp/kde-ubuntudjJ1e3/simond/default/compile/hmm24/macros
file:///tmp/kde-ubuntudjJ1e3/simond/default/compile/stats
file:///home/ubuntu/Documents/201007/ipa-prompts/german-dictation-scenario.xml

Please read the previous articles if you are a simon newbie.

I didn’t check whether Ralf’s German speech model 0.1.1 works if you import it as static model. At least, I dictated a few seconds ago the following words from section xda with my head set:

Benutzermodus Benutzerhandbuch Benutzungshinweisen Beobachtermission Bertha Bestrafungen Betreiberfirma benutzbar benützende berstenden bespannender bespieltest besprochener beteiligst beunruhigende beäugter brenzliger

All words were recognized correctly. At the moment, most (but not all) words begin with the letters a or b. I will add more words to Ralf's German speech model during the next weeks.

My next goal is to reach the 10.000 words mark.

Ralf’s German speech model 0.1.1
contains words from Ralf's German IPA Flac files (sections xaa, xab, xac, xad, xba, xca, xda). I am planning to add more sections to the speech model in the near future.

Video: Recognize 200 German words

Sunday, December 27th, 2009

Download the video: Recognize 200 German words under Ubuntu (30 MB; 13 minutes; the video will be replaced from time to time as soon as I have trained more words).

100 % of the words were recognized correcty. All words that I am dictating in this video are included in Ralf's German dictionary.

This video proves:
- simon works well under Ubuntu 9.10 (64-bit);
- Ralf's German dictionary allows good recognition results;
- up to 100 % error free is possible (I didn’t expect such a good result).

Each word in this dictation video has been trained 3 times.

Not (yet) a dictation program

Sunday, July 26th, 2009

I just read that simon is not (yet) a dictation program. I hope that is going to change some day.

Speaker independant german acoustic model

Tuesday, April 28th, 2009

Well, in my opinion we need a speaker independent German acoustic model. There are some training utterances collected by VoxForge. The question is: how can we process these German utterances with Simon? Is there a possibility to import utterances (automatically)?

I don’t want to train each word separately with Simon. I would like to import utterances that have been collected by VoxForge.

For me, it’s no problem to produce lots of training utterances. I have produced and published more than 10.000 training utterances in the German languange. If I just knew how to process them.

Let me give you a short insight how to produce training utterances: First, I dictate lots of utterances with DNS 9 Preferred in German (so there isn’t a problem with copyright since I am the author). Afterwards, I record these sentences with Audacity (File > Export Multiple). I am exporting about 25 utterances at a time. So this is a pretty productive approach to get lots of training material. But the problem is: How can I process this training material?

Training: the recording failed

Thursday, April 23rd, 2009

I would like to follow the approach described in the simon handbook. So I tried to record the word ‘Computer’ after selecting the corresponding pronounciation from the shadow dictionary. But the recording failed:

recording-failed

I don’t know how to get the proper permission. I know the trick gksudo nautilus, but this isn’t a solution to my problem here. How can I record a word with simon? Recording with Audacity (in combination with JACK) isn’t a problem and works pretty fine.

Is there a way to record words with Audacity and import them into simon?

Package installer: included files

Wednesday, April 22nd, 2009

I just installed the simon deb file on my Ubuntu computer. Before that, I used the Synaptic Package Manager to remove a previous version of simon.

The Package Installer displays the following included files:

usr/
usr/bin/
usr/bin/simond
usr/bin/juliusexe
usr/bin/adintool
usr/bin/mkfa
usr/bin/ksimond
(more…)

Simon 0.2-rc1 available

Wednesday, April 22nd, 2009

I just saw than simon 0.2-rc1 is available. And now I want to take a look into the release notes:

“It includes a new Czech translation and the beginning of a Spanish one.”

Spanish, that sounds good – I am learning Spanish at the moment. A year ago, I tried dictate in Spanish using the Windows Vista Speech Recognition engine. But I sold my copy of Vista because of poor performance on my computer. I hope that in a few years or so there will be an open source alternative.

Another thing: I read that there was an accident. I wish you a quick recovery. Get well soon!

I think that I will try to install simon-0.2rc1-Linux.deb on my Ubuntu machine.

ksimond launches simond and creates tray icon

Sunday, March 15th, 2009

I just learned about the following detail:

“When you start ksimond it will by default also launch simond and create a tray icon”

So, simon is the client, and simond is the server. And what is the role of ksimond? It seems to launch simond. But why is this necessary? Obviously, the function of ksimond is to create a tray icon. When I start just simond, there isn’t a tray icon on my Ubuntu machine. So ksimond seems to be good for the usability.

English: Tips & Tricks

Wednesday, March 11th, 2009

I just took a look into English: Tips & Tricks. When collecting pronounciations for the dictionary acquisition project I think about the same question: A lot of German words could be composed of pronounciations of shorter words.

Drag and drop words from the shadow dictionary

Monday, February 9th, 2009

I just found out that it is possible to drag and drop words from the shadow dictionary to the right column:

drag the word to the column on the right to train it
Simon allows you to drag and drop words for training

That’s a pretty nice feature. By the way, the shadow dictionary has about 8000 entries. It had been generated by importing the German PLS dictionary a few hours ago.

Thougts about XML:

In my opinion, the open source speech recognition development community needs to shift more to XML-based standards. PLS is one important standard for speech recognition. For the training of utterances (prompts), an SSML interface should be developed.

So, Simon allows the import of an PLS dictionary. My wish is that a future version of Simon would be able to import English and German XML documents that are similar to the SSML standard. Let me explain: I try to follow the PLS standard and the SSML standard. But when I discover problems, I prefer to rely just on XML.
To be more concrete: PLS documents should have the ending ‘.pls’ (I think that I have read that somewhere in the internet). But I prefer to use the suffix ‘.xml’. The suffix ‘.pls’ could be interpreted as ‘playlist’. This ambiguity should be avoided. So the suffix ‘.xml’ seems to be better.
And what about SSML? I don’t know whether SSML allows the use of FLAC audio files. But I prefer FLAC over WAV. So I am using just FLAC.

XML provides solutions to a lot of problems of artificial intelligence. Speech recognition is just a special case of artifical intelligence. Think of the world as a big giant graph – XML helps us to describe this graph. XML documents can be read by humans and can be processed by specialised software (e.g. Simon can import PLS documents).

My goal is to get an open source dictation system. I think that we can reach this goal by shifting to PLS (already done) and to SSML (has to be done).

At present, neither VoxFore nor Simon use the advantages of SSML. But for the training of our speech models, SSML would increase the usability for the end user.

Simon currently allows to get XML texts from the internet:

internet extensions
Download training text from the internet

This extension is just for text. But there should be an extension for XML documents that provide access to FLAC files. Such an extension for the import of SSML/XML documents from the internet would be very useful.