Ralf’s Interlingua dictionary

January 10th, 2012 by producer

This article explains how I create the dictionary, and how the imported result looks like in simon.

A. Creation of the PLS dictionary:

1. Get spelling dictionary.
2. License is GPL. It says in the file README_en.txt:

This spell check dictionary for Interlingua is licensed under GPL. [...] This hyphenation rules for Interlingua are licensed under GPL.

This means that I can use this spelling dictionary as source.
3. Extract dict-ia-2010-11-29.oxt.
4. ISO 639-1 language code is ia.
5. Probably I will use this table for grapheme to phoneme conversion.

6. Check the encoding of ia_iso.aff and ia_iso.dic. Both files are encoded in ISO 8859-1. Probably it is best if I convert the encoding of both files into UTF-8.
iconv -f ISO-8859-1 -t UTF-8 < ia_iso.dic > interlingua-utf8.dic
iconv -f ISO-8859-1 -t UTF-8 < ia_iso.aff > interlingua-utf8.aff

Change the first line in interlingua-utf8.aff into SET UTF-8. Both files contain CRLF at the end of each line (Windows mode). I don’t know whether this is ok with the unmunch command. I will check it out:

ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ unmunch interlingua-utf8.dic interlingua-utf8.aff > interlingua-wordlist

Obviously, it worked. The CRLF is part of the source files. The target file contains just a LF (Unix mode). There are a lot of duplicate entries. I think that these duplicate entries will be removed later by an .xsl script.

7. Add lexicon tags at the beginning and the end of interlingua-wordlist.

8. Create XML file:

ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ saxonb-xslt -s:interlingua-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:interlingua.xml

9. Create PLS dictionary:

ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ saxonb-xslt -s:interlingua.xml -xsl:'improve-interlingua.xsl' -o:interlingua-dictionary.xml

B. Download the dictionary. Import it into simon.

The left column contains the words. The pronunciation column contains the corresponding SAMPA transcriptions. The Category column contains just “Unknown” entries.

Now you know how I created the dictionary and how the result looks like in simon.

Ralf’s Arabic dictionary

January 10th, 2012 by producer

This article explains the creation of an Arabic PLS dictionary and how the result looks like in simon.

A. Creation of the dictionary:

1. Get Arabic spelling dictionary.
2. Check the license. Inside the file dict_ar-3.0.oxt there is a file with the name COPYING (in the docs folder). It says in the file:

GPL 2.0/LGPL 2.1/MPL 1.1 tri-license

This means that I can use this tri-licensed spelling dictionary as source for my future GPLv3 PLS dictionary.

3. Now I have to extract dict_ar-3.0.oxt.
4. Let’s try the unmunch command inside the Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Arabic$ unmunch ar.dic ar.aff > arabic

It failed. I wasn’t able to unmunch the word list.
5. I have to remove all numbers from ar.dic. This can be done with the sed command:

sed 's/[0-9]*//g' ar.dic > arabic-without-numbers

6. Remove the slash (“/”) from arabic-without-numbers with Geany.
7. Add lexicon tags at the beginning and the end of the file.
8. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Arabic$ saxonb-xslt -s:arabic-without-numbers -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:arabic.xml

9. ISO 639-1 language code is ar.
10. Maybe I will use this table for the grapheme to phoneme conversion.
11. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Arabic$ saxonb-xslt -s:arabic.xml -xsl:'improve-arabic.xsl' -o:arabic-dictionary.xml

I have to remove the number sign (“#”) with Geany from arabic.xml.

B. Download the dictionary. Import it into simon.

The left column contains 457089 Arabic words. The pronunciation column contains the corresponding SAMPA transcriptions. The third column contains just entries with “Unknown”. This is because the PLS dictionary contains no role attributes.

Now you know how I created the dictionary. And you know how the result looks like in simon.

Ralf’s Hebrew dictionary

January 10th, 2012 by producer

In 2009, I made some initial tests with Hebrew. Now it is time to develop a Hebrew PLS dictionary that is much bigger than the sample dictionary from 2009 (which I have deleted). This article explains how I create the dictionary, and how the result looks like when imported into simon.

A. Creation of the dictionary:

1. Get Hebrew spelling dictionary from OpenOffice.org.
2. License is GPL. There is a copyright notice inside the file he_IL.aff.

3. I tried to unmunch the dictionary in the Ubuntu terminal, but unfortunately I failed:

ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ unmunch he_IL.dic he_IL.aff > hebrew-test

4. The source file he_IL.dic contains a lot of numbers. I remove them with the Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ sed 's/[0-9]*//g' he_IL.dic > hebrew-without-numbers

With Geany, I remove the “,” (commas) and the “/” (slashes) that still are included within in the file hebrew-without-numbers. Now I have a clean word list with 43.000 Hebrew words.

5. Add lexicon tags at the beginning and the end of hebrew-without-numbers.
6. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ saxonb-xslt -s:hebrew-without-numbers -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:hebrew.xml

7. ISO 639-1 language code is he.
8. I need a table for grapheme to phoneme conversion. Maybe I will use this table. There are several tables available at Wikipedia. I am not sure which one I should use. I have an idea: as far as I know, Yiddish and Hebrew share the same alphabet. This means I could try to use the Yiddish improve-yiddish.xsl style sheet:

ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ saxonb-xslt -s:hebrew.xml -xsl:'/home/ubuntu/Documents/2011-II/Yiddish/dictionaries/improve-yiddish.xsl' -o:hebrew-dictionary.xml

The result is that most Hebrew letters have been converted into IPA. There is only one Hebrew letter that hasn’t been converted: [א] I will add this phone to the .xsl style sheet with the name improve-hebrew.xsl. Now I try it again:

ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ saxonb-xslt -s:hebrew.xml -xsl:'improve-hebrew.xsl' -o:hebrew-dictionary.xml

The result is not so good: Maybe I should adjust the grapheme to phoneme conversion rules for modern standard Israeli Hebrew. Or is this not necessary? I think for a first draft I can use the Yiddish transformation rules.

B. Download the dictionary. Import it into simon as shadow dictionary.

Take a look at the result: The left column contains 43933 Hebrew words. The pronunciation column contains the corresponding SAMPA transcriptions. The category column is unemployed (or to be more exact: displays just Unknown) since the source PLS dictionary contains no role attributes.

Now you know how I created the dictionary. And you know how the result looks like in simon. This dictionary uses more or less Yiddish pronunciation because I was too lazy to adjust it to modern standard Israeli Hebrew. It shouldn’t be a problem to adjust the style sheet improve-hebrew.xsl so that the phoneme results are better.

Ralf’s Belarusian dictionary

January 9th, 2012 by producer

This article explains how I create this PLS dictionary and how the imported result looks like.

A. Creation of the Belarusian PLS dictionary:

1. Get spelling dictionary. I choose the official orthography.
2. License is LGPL (see hyph_be_BY.dic). I am allowed to “convert any LGPLed piece of software into a GPLed piece of software.” I did this before. And I will do it again. This means that I get a spelling dictionary that is licensed under the LGPL. And I will produce a pronunciation dictionary that is licensed under the GPLv3. By the way, all my dictionaries are licensed under the GPLv3.
3. Extract dict-be-official.oxt.

4. The file be-official.aff is encoded in UTF-8. The file be-official-dic may be encoded in ISO-8859-1. At least this encoding is displayed by Geany. I believe that be-official-dic is encoded in microsoft-cp1251. I had this encoding before (Macedonian and Bulgarian).
Now it is time to use the Ubuntu terminal:
cd /home/ubuntu/Documents/2011-II/Belarusian
iconv -f cp1251 -t UTF-8 <be-official.dic >belarusian-utf8.dic

The text file belarusian-utf8.dic looks fine.

5. Now I change the line SET microsoft-cp1251 in the file be-official.aff into SET UTF-8
6. I don’t know whether the next step is necessary. I could convert the file hyph_be_BY.dic from cp1251 into UTF-8. At the moment, I skip this step.

7. Ubuntu terminal: unmunch belarusian-utf8.dic be-official.aff > belarusian-wordlist I think that this step wasn’t necessary. It didn’t extract the word list. At the moment, I have a word list of 1.5 million words. This is way too much. I have to reduce the dictionary size. The target size is 400.000 words.

8. I have to reduce the dictionary size. I found a tip. Ubuntu terminal:

sed -n 'p;N;N;N' belarusian-wordlist > belarusian-wordlist-reduced

Yes, it worked. The word list contains now 391.000 words. This is a good basis for a PLS dictionary.

9. Add lexicon elements at the beginning and the end of belarusian-wordlist-reduced.
10. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Belarusian$ saxonb-xslt -s:belarusian-wordlist-reduced -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:belarusian.xml

11. Language code is be.
12. I will use this table for grapheme to phoneme mapping.
13. Creation of the phoneme elements:

ubuntu@ubuntu:~/Documents/2011-II/Belarusian$ saxonb-xslt -s:belarusian.xml -xsl:'improve-belarusian.xsl' -o:belarusian-dictionary.xml

B. Download and import the dictionary.

Let’s take a look at the result. The left column contains 391669 Belarusian words. The pronunciation column contains the corresponding SAMPA transcriptions. All entries in the third column are marked as “Unknown”. This is because the Belarusian PLS dictionary doesn’t contain any role attribute.

Now you know how I created the dictionary. And you got an impression how the result looks like when imported into simon.

Ralf’s Asturian dictionary

January 5th, 2012 by producer

This article explains how I create the Asturian PLS dictionary, and some words about the import into simon.

A. How I create the dictionary:
1. Get spelling dictionary.
2. Check license. It is GPLv3.
3. Extract asturianu.oxt.
4. Language code is ast.
5. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ unmunch ast.dic ast.aff > asturian-wordlist

The result is a file of 70MB with more than 5 million words. This word list is too big. I should reduce it. I had the same problem with my Latin dictionary. I had to reduce the size.

6. Add lexicon elements at the beginning/end of asturian-wordlist.

7. Generate .xml document with lexicon, lexeme and grapheme elements:

ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:asturian.xml

I got an error message because the available space isn’t enough (“Java heap space”). I think that I should reduce the file size with grep. Or I install VisualVM. I think I will work with grep:
a. Remove lines that begin with l’: ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ grep -v ^l\’ asturian-wordlist > asturian-wordlist-02
b. Remove lines that begin with t’: grep -v ^t\’ asturian-wordlist-02 > asturian-wordlist-03
c. Remove lines that begin with s’: grep -v ^s\’ asturian-wordlist-03 > asturian-wordlist-04
d. Remove lines that begin with m’: grep -v ^m\’ asturian-wordlist-04 > asturian-wordlist-05
e. Remove lines that begin with n’: grep -v ^n\’ asturian-wordlist-05 > asturian-wordlist-06
f. Remove lines that begin with d’: grep -v ^d\’ asturian-wordlist-06 > asturian-wordlist-07
g. Remove lines that begin with qu’: grep -v ^qu\’ asturian-wordlist-07 > asturian-wordlist-08
h. Remove lines that begin with p’: grep -v ^p\’ asturian-wordlist-08 > asturian-wordlist-09
The dictionary will contain 1.1 million words. I think that that number is acceptable.

8. And now Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian-wordlist-09 -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:asturian.xml

This command creates a PLS dictionary without phoneme elements. The phoneme elements will be added later.

9. I will use this table for grapheme to phoneme conversion. Here is the command that creates the phoneme elements:

ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian.xml -xsl:'improve-asturian.xsl' -o:asturian-dictionary.xml

10. I tried to import the resulting dictionary into simon. Unfortunately, simon didn’t react any more after the import had been finished. I assume that the dictionary is way too big. I have to reduce its size, again.
a. Remove lines that contain ‘l: grep -v \’l asturian-wordlist-09 > asturian-wordlist-10
b. Continue to reduce the size of the wordlist: grep -v ylu astorian-wordlist-10 > astorian-wordlist-11
c. This isn’t enough, I have to remove about 80.000 words: grep -v les asturian-wordlist-11 > asturian-wordlist-12
d. Remove 136.000 words: grep -v mos asturian-wordlist-12 > asturian-wordlist-13
e. Remove 67.000 words: grep -v los asturian-wordlist-13 > asturian-wordlist-14
f. Remove 265.000 words: grep -v es asturian-wordlist-14 > asturian-wordlist-15
You see it is a lot of work to get a dictionary size that is suitable for simon. At the moment, the word list contains 539.000 words. Is this number OK, or should I continue to reduce the size? I think that I will try it again. Again, I will create an .xml file:

ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian-wordlist-15 -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:asturian.xml

And now I repeat step 9. The file asturian-dictionary.xml has a size of 45 MB. I hope that this size is OK.

B. Download the dictionary. Import it into simon.

Take a look at the result. In the left column, you can see the Asturian words. This dictionary contains 539928 words. The right column contains the corresponding SAMPA transcriptions.

You could see that it was a lot of work to reduce the size of the dictionary. At least, now it has a size that isn’t too big for simon.

Ralf’s Yiddish dictionary

January 3rd, 2012 by producer

This article explains some details about the creation of the dictionary, and how the result looks like in simon.

A. How I create Ralf's Yiddish dictionary:

1. Get spelling dictionary.
2. License is GPLv3.
3. Extract jidysz.net.ooo.spellchecker.oxt.
4. Ubuntu terminal:
cd /home/ubuntu/Documents/2011-II/Yiddish/dictionaries
sudo apt-get install hunspell-tools
unmunch yi.dic yi.aff > yiddish-wordlist

5. Add <lexicon> at the beginning of yiddish-wordlist. Add </lexicon> at the end of this file.
6. Generate .xml document with lexicon, lexeme and grapheme elements:

ubuntu@ubuntu:~/Documents/2011-II/Yiddish/dictionaries$ saxonb-xslt -s:yiddish-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:yiddish.xml

7. ISO 639-1 language code is yi.
8. I think I will use this table as source for the grapheme to phoneme mapping.
9. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/Yiddish/dictionaries$ saxonb-xslt -s:yiddish.xml -xsl:'improve-yiddish.xsl' -o:yiddish-dictionary.xml

B. Download the dictionary, and import it into simon.

Take a look at the result. The left column contains the Yiddish words. This dictionary contains 99980 words. The right column contains the corresponding SAMPA transcription.
Yiddish is written in the Hebrew alphabet. The Hebrew alphabet is written from right to left. Obviously, the corresponding SAMPA transcriptions are written from left to right. This means that the phoneme order should be fine.

There are a lot of other PLS dictionaries available. Find the PLS dictionary that suits your language.

Import of your Grammar

January 2nd, 2012 by producer

In this post I want to write some words about the Grammar / Import function of simon 0.3. Here is what I do:

1. Import Schott’s German dictionary as active dictionary into simon.

2. Open the Grammar tab. Press the Import button.

3. Simon starts a wizard. Press the Next button.

4. Let’s try and check the option Also import unknown sentences. I don’t know whether this is a good decision. So let’s give it a try.
This is interesting: “words with more than one terminal” – is it now possible to use more than one entry for the role attribute? The current version of Schott’s German dictionary employs just one entry for each role attribute. The PLS standard allows more entries.
Please, download and extract Schott’s German utterances. This compressed folder 15000-german-utterances.zip contains a plain text file with more than 15000 utterances. I am the author of these utterances, and I have licensed them under the GPLv3. The utterances are designed to be used in conjunction with Schott’s German dictionary (former name: Ralf’s German dictionary). Every word that is included within Schott’s German utterances should be included in Schott’s German dictionary, too. I am 99% sure that this is the case, but I can’t guarantee it. If some words are missing in Schott’s German dictionary, please inform me, and I will include them within the next version of Schott’s German dictionary.
You can import my German utterances using simon’s Import Text option (copy & paste).

5. The import has been completed. There are a lot of lines that contain the Unknown terminal. Probably it would have been better if I wouldn’t have checked the option Also import unknown sentences in step 4.

6. Because simon didn’t react any more, I forced it to quit. I tried to start simon several times, but it wouldn’t start. Now there are several simon zombie statuses displayed. I was able to end these zombie / sleeping processes. But at the moment, it seems to be impossible to start simon again (a new zombie / sleeping status is beeing created if I try to start simon again).

Conclusion: I don’t recommend to check the option Also import unknown sentences. I tried the import Grammar function before without checking this option. Simon reacted normal, everything seemed to be fine.

Schott’s German dictionary 0.2.8

November 1st, 2011 by producer

Here is how I create Schott’s German dictionary 0.2.8 (with the style sheet improve-german.xsl):

1. Replace 152 matches:

<xsl:when test=”contains(lower-case(../grapheme), ‘planung’)”><xsl:value-of select=”replace($sierra, ‘planʊŋ’,'plaːnʊŋ’)”/></xsl:when>

2. Replace 178 matches:

<xsl:when test=”contains(lower-case(../grapheme), ‘fußball’)”><xsl:value-of select=”replace($sierra, ‘fʊsbal’,'fuːsbal’)”/></xsl:when>

A lot of other small changes have been made. Please, import Schott’s German dictionary (author: Kai Schott) into simon.

German speech model ‘deep’

August 7th, 2011 by producer

Visit Schott’s German IPA FLAC files (section: deep) or Voxforge. Download and import the German speech model ‘deep’.

German speech model ‘deek’

August 3rd, 2011 by producer

Visit Schott’s German IPA FLAC files (section: deek) [source 1] or Voxforge [source 2]. Get the corresponding speech model [object], and import it into simon 0.3.

German speech model ‘deef’

August 1st, 2011 by producer

Visit Schott’s German IPA FLAC files (section: deef) [source 1] or Voxforge [source 2]. Get the corresponding speech model [object], and import it into simon 0.3. Watch my video about this speech model:

These are the words that were recognized in the video:

Randschicht Randproblem Randprobleme Randproblemen Randproblems Randpunkt Randsportart Randstellung Randträger Rechenvorrichtung Randwerbung Randwinkel Randträger Randzone Rangabzeichen Rangabzeichens Rangelei Rangfolgen Rangliste Ranglisten Rangordnung Rangordnungen Ranguns Rangstufe Rangstufen Rappe Ratte Rapport Rapporte Rapporten Rapports Rapsfeld Rapsfelder Rapsfeldern Rapsfelds Rasenfläche Rassenkämpfen Rasenloch Rasenplätze Rasenplatz Rasenplätze Rasenplätzen Rasensport Rasenstück Rasentraktor Rasentrimmer Raserei Rasereien Rasierapparat Rasierapparaten Rasierapparats Rasierer Rasierklinge Rasierpinsel Rasierschaum Rasierseife Rasierwasser Rasierzeug Rasmussen Raspe Raspeln Rassenhass Rassenhasses Rassenkampf Rassenkonflikt Rassenkrawall Rassenkunde Rassenkämpfe Rassenkämpfen Rassenmischung Rassenmischungen Rassenproblem Rassenprobleme Rassentrennung Rassenunruhen Rassepferd Rastalocken Rastblech Rastdorn Rastdorne Rasterbild Rasterdecke Rasterdruck Rasterelektronenmikroskop Rasterfahndung Rasterfahndungen Rastergestaltung Rastermaße Rastermaßen Rastermaßes Rastern Rasterpapier Rasterpunkt Rasterpunktabfühlung Rasterpunktlesen Rasters Rasterung Rasterweite Rasterweiten Rastkappe Rastkontakt Rastlappen Rastlosigkeit Rastmechanismus Rastmoment Rastmontage Rastplatz Rastplätze Rastplätzen Rastpunkt Raststätten Rasur Ratifikationen Ratifikationsurkunde Ratifikationsurkunden Ratifizierens Ratifizierung Ratifizierungen Ratifizierungsdebatte Rechnerbaugruppe Ratingen Ration Rationalisieren Rationalisierens Rationalisierung Rationalisierungen Rationalisierungsdruck Rationalismus Rationalist Rationalisten Rationalität Rationen Rationierens Rationierung Rationierungen Ratlosigkeit Rasanz Ratsamkeit Ratsbeschluss Ratsbeschlusses Ratschlag Ratschlags Ratspräsidenten Ratspräsident Ratsmitgliedern Ratsmitglieder Ratsmitglied Ratsherren Ratsche Ratssitzung Ratssitzungen Ratstisch Ratsvorsitz Ratsvorsitzende Randverbinder Rattenfleckfieber Rattenfänger Rattengift Rattenhaus Rattenkönig Rattenloch Rattenloches Rattenplage Rattenschwanz Rattermarken Ratzinger Rauchkammer Rauch Rauchabzug Rauchbombe Rauchens Rauchen Raucher Raucherinsel Rauchern Rauchgas Raucherzimmer Raucherzone Rauchers Rauchfahne Rauchfang Rauchfass Rauchfleisch Rauchgas Rauchgasfilter Rauchgasfühler Rauchgaskanal Rauchgaswäsche Rauchgaszug Rauchgenuss Rauchgenusses Rauchglas Rauchigkeit Rauchkammer Rauchmaschine Rauchmelder Rauchpilz Rauchschwaden Rauchsäule Rauchsäulen Rauchverbot Rauchvergiftung Rauchverzehrer Rauchverzicht Rauchvorhang Rauchwand braven Raudi Rauchpilz Raufbold Rauferei Raufhandel Rauflustigkeit Raufrost Rauheiten Rauheit Rauheitsbeiwert Rauheitsbestimmung Rauhut Raum Raumabtrennung Raumakustik Raumangst Raumanzug Raumaufteilung Raumaufteilungen Raumausstatter Raumbedarfs Raumbelastung Ratsmitgliedern Raumbereich Raumbereichen Raumbereichs Raumbuch Raumfahrtagentur Raumfahrtbehörde Raumtransport Raumtransport Raumfahrtkonzern Raumfahrtkonzerns Raumfahrtprogramm Raumfahrtsparte Raumfahrttechnik Raumfahrzeuge Raumfahrzeuge Raumflug Raumflugs Raumforscher Raumforschung Raumfähre Raumgeräuschpegel Raumgeschwindigkeit Raumgestaltung Raumgewicht Raumgruppe Raumhöhe Rauminhalt Raumkapsel Raumkelle Raumklima Raumladung Raumlufttechnik Raummangel Raummaß Raumordnung Raumordnungsverfahrens Raumpfleger Raumpflegerin Raumplanung Raumprogramm Raumrichtung Raumschiff Raumschiffen Raumschiffes Raumthermostat Raumverhältnis Raumverhältnisse Raumverhältnissen Raumverhältnisses Raumwinkel Raumzeitalter Raute Rauchwaren Raupenantrieb Raupenantrieben Raupenbagger Rauschebart Rauschens Rauschgift Rauschgiftdezernat Rauschgifte Rauschgiftsucht Rauschgiftverbot Rauschgold Rauschgoldengel Rauschmittel Rauschpegel Rauschpegels Rauschunterdrückung Rauschuntergrund Rauschuntergrunds Rausschmeißer Rauschgifts Rausschmisse Raupentrieb Rauchzeichen Randwähler Ravennas Reagenzglases Reagenzgläser Reagenzpapier Reaktionsmöglichkeiten

A lot of words were recognized correctly. That is not a bad result.

sudo make uninstall

August 1st, 2011 by producer

A few minutes ago in my Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/speech2text/build$ sudo make uninstall

Then download of simon_0.3.0-1ubuntu8_amd64.deb. Then I installed this version with Ubuntu Software Center.

There is no PPA for Ubuntu 11.04

August 1st, 2011 by producer

There is no PPA for Ubuntu 11.04. How is it possible to install an old PPA (Maverick)? What is the command that I have to enter to install an old PPA?

Export test result with sam

August 1st, 2011 by producer

Yesterday, I installed Qwt 6, and then built simon 0.3.60. It was difficult, but in the end it worked out fine. And look, sam offers now an Export test result button (top right of the screen shot):

I want to export the following information: Filename, Expected result, Actual result, Recognition rate (below 50%). The resulting document should be a simple text file (or XML file or whatever). Is this possible with the current Export test result function of simon?

ppa.launchpad.net – where is natty?

July 31st, 2011 by producer

This is what I did a few minutes ago:

ubuntu@ubuntu:~/Documents/2011-II/speech2text$ sudo add-apt-repository ppa:grasch-simon-listens/simon

Then I typed into the Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/speech2text$ sudo apt-get update

And then the following message appeared:

[...] W: Failed to fetch http://ppa.launchpad.net/grasch-simon-listens/simon/ubuntu/dists/natty/main/source/Sources 404 Not Found

W: Failed to fetch http://ppa.launchpad.net/grasch-simon-listens/simon/ubuntu/dists/natty/main/binary-amd64/Packages 404 Not Found

Then I took a look into http://ppa.launchpad.net/grasch-simon-listens/simon/ubuntu/dists/. Obviously there isn’t a directory called natty.

I want to get the newest version of simon/sam. Where can I find it?
–> It is not here. This version is from October 2010.
–> At ppa.launchpad.net/grasch-simon-listens/ I find only versions made for lucid and maverick.
–> I wasn’t successful with git because some problem with qwt 6 came up.

I want to see whether the newest version of sam has an ‘Export test results’ button.

Remove the package “simon”

July 31st, 2011 by producer

I want to get the newest simon version via git. Here is what I do:

1. Ubuntu terminal:

sudo apt-get install git

2. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II$ git clone git://speech2text.git.sourceforge.net/gitroot/speech2text/speech2text

3. System > Administration > Synaptic Package Manager:
Remove the package “simon” (Mark for Complete Removal).
simon is now not visible any more in Synaptic. So obviously, it has been completely removed.

4. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/speech2text$ git pull origin master
From git://speech2text.git.sourceforge.net/gitroot/speech2text/speech2text
* branch master -> FETCH_HEAD
Already up-to-date.
ubuntu@ubuntu:~/Documents/2011-II/speech2text$

5. Ubuntu terminal:

ubuntu@ubuntu:~/Documents/2011-II/speech2text$ ./build_ubuntu.sh
– The C compiler identification is GNU
– The CXX compiler identification is GNU
– Check for working C compiler: /usr/bin/gcc
– Check for working C compiler: /usr/bin/gcc — works
– Detecting C compiler ABI info
– Detecting C compiler ABI info – done
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ — works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info – done
CMake Error at cmake/FindZLIB.cmake:25 (MESSAGE):
Could not find ZLIB
Call Stack (most recent call first):
julius/libsent/CMakeLists.txt:3 (find_package)

– Configuring incomplete, errors occurred!
touch: cannot touch `./julius/gramtools/mkdfa/mkfa-1.44-flex/*’: No such file or directory
ubuntu@ubuntu:~/Documents/2011-II/speech2text$

6. Question: What do I have to do to get simon going from git repository?

Edit 19.15: I am trying the following:

sudo apt-get install git-core build-essential cmake bison flex gettext gettext-kde kdeartwork \
kdelibs5-dev libxtst-dev libqt4-sql-sqlite qtmobility-dev libphonon-dev libattica-dev libattica0 zlib1g-dev \
portaudio19-dev

Edit 19.25:

ubuntu@ubuntu:~/Documents/2011-II/speech2text$ ./build_ubuntu.sh
– Found Portaudio: /usr/lib/libportaudio.so
– Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so
– Found Pthreads: /usr/lib/x86_64-linux-gnu/libpthread.so
– Looking for Q_WS_X11
– Looking for Q_WS_X11 – found
– Looking for Q_WS_WIN
– Looking for Q_WS_WIN – not found.
– Looking for Q_WS_QWS
– Looking for Q_WS_QWS – not found.
– Looking for Q_WS_MAC
– Looking for Q_WS_MAC – not found.
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so;/usr/lib/x86_64-linux-gnu/libXau.so;/usr/lib/x86_64-linux-gnu/libXdmcp.so
– Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so;/usr/lib/x86_64-linux-gnu/libXau.so;/usr/lib/x86_64-linux-gnu/libXdmcp.so – found
– Looking for gethostbyname
– Looking for gethostbyname – found
– Looking for connect
– Looking for connect – found
– Looking for remove
– Looking for remove – found
– Looking for shmat
– Looking for shmat – found
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Looking for include files CMAKE_HAVE_PTHREAD_H
– Looking for include files CMAKE_HAVE_PTHREAD_H – found
– Looking for pthread_create in pthreads
– Looking for pthread_create in pthreads – not found
– Looking for pthread_create in pthread
– Looking for pthread_create in pthread – found
– Found Threads: TRUE
– Looking for _POSIX_TIMERS
– Looking for _POSIX_TIMERS – found
– Found Automoc4: /usr/bin/automoc4
– Found Perl: /usr/bin/perl
– Found Phonon: /usr/include
– Performing Test _OFFT_IS_64BIT
– Performing Test _OFFT_IS_64BIT – Success
– Performing Test HAVE_FPIE_SUPPORT
– Performing Test HAVE_FPIE_SUPPORT – Success
– Performing Test __KDE_HAVE_W_OVERLOADED_VIRTUAL
– Performing Test __KDE_HAVE_W_OVERLOADED_VIRTUAL – Success
– Performing Test __KDE_HAVE_GCC_VISIBILITY
– Performing Test __KDE_HAVE_GCC_VISIBILITY – Success
– Found KDE 4.6 include dir: /usr/include
– Found KDE 4.6 library dir: /usr/lib
– Found the KDE4 kconfig_compiler preprocessor: /usr/bin/kconfig_compiler
– Found automoc4: /usr/bin/automoc4
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found libsamplerate: /usr/lib/libsamplerate.so
– Found ALSA: /usr/lib/libasound.so
– Enabling resample support
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Enabling simon scenario support.
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Could NOT find KdepimLibs (missing: KdepimLibs_CONFIG) (Required is at least version “4.5.60″)
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
CMake Error at cmake/FindQwt6.cmake:101 (MESSAGE):
Could not find Qwt 6.x
Call Stack (most recent call first):
sam/src/CMakeLists.txt:1 (find_package)

– Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found. Stop.
ubuntu@ubuntu:~/Documents/2011-II/speech2text$

And what now? What should I do now?

German speech model ‘deea’

July 30th, 2011 by producer

Visit Schott’s German IPA FLAC files (section: deea) (source 1) or Voxforge (source 2). Download the German speech model ‘deea’ (object). Watch my video about this speech model:


There are a lot of recognition errors that need to be fixed.

Read the rest of this entry »

German speech model ‘dedv’

July 28th, 2011 by producer

Here is how I create the German speech model ‘dedv’:

1. Open the file de-dv. It contains a list of 1000 words (only the phonetic transcriptions).

2. Start Audacity.

3. Now I read every word in the list. Between each word, there is a pause of 1-2 seconds. Later, Audacity will find the pauses automatically.

4. Mark the whole recording with a double-click.
5. Then select Analyze > Sound Finder…

6. Set the Label starting point to 1.0. Set the Label ending point to 1.0. Why? Because simon has to know the amount of background noise.

7. Let’s eliminate the error at position 237. There was some noise (above 26 dB, I think) at position 237, and not a word. Mark the area with the mouse. Then press the Silence button (see top right of the picture).

8. Let’s have a look at the text file de-dv, and at the audio file. The text file ends with line 841. The Audacity audio file ends with number 841. Both files correspond to each other.

9. Select Audacity > File > Export Labels… The position (starting point and ending point) of each label will be exported into a simple text file. Export the labels to a file named labels.txt.

10. Open labels.txt with Geany. The file labels.txt ends at line 841. The first number of each line indicates the label starting point. The second number indicates the label ending point. The third number indicates the label itself.
You can see in the picture that de-dv is open, too. Both files – labels.txt and de-dv – have a length of exactly 841 lines.

11. Geany > Search > Replace.
Search for: \t\w+$ (t means tab; w means alphanumeric character; + means this: “The plus sign indicates that there is one or more of the preceding element”; $ means end of line)
Don’t forget to mark Use regular expressions.
This procedure removes the third number from each line.

12. You can see that the third column has been removed thanks to the regular expression procedure.

13. Now it is time to merge both files: labels.txt should be merged with de-dv. This is done via the paste command in the Ubuntu terminal:

ubuntu@ubuntu:~$ paste /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/labels.txt /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/de-dv > /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/pasted.txt

The resulting document is named pasted.txt.

14. You can see that the document pasted.txt has a third column: The labels are the phonetic transcriptions!

15. Now let’s go back to Audacity > File > Import > Labels… Take a look at the result. Each label is a phonetic transcription of the corresponding recording.

16. Audacity > File > Export Multiple…
Export format: FLAC files
Export location: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/flac-dedv
Split files based on: Labels
Name files: Using Label/Track Name
Press the Export button.

17. Now you know how I create the FLAC files that are part of Schott’s German IPA FLAC files.

18. Let’s generate a PLS dictionary that contains about 841 entries. This is done in the Ubuntu terminal:

ubuntu@ubuntu:~$ cat /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/german-0.2.7.xml | saxonb-xslt -ext:on -s:- -xsl:/media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/combine-0.2.4/compare.xsl

The result is a PLS dictionary at the following location: file:///media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/lexicon-dedv.xml

19. Now I need a prompts file. This is generated, too, via Ubuntu terminal:

ubuntu@ubuntu:~$ saxonb-xslt -ext:on -s:/media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/german-0.2.7.xml -xsl:’/media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/combine-0.2.4/lexicon2prompts.xsl‘ -o:’/home/ubuntu/Documents/dummy.xml’

20. Now it is time to upload the package file:///media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/german-ipa-flac-files-dedv-20110727.tar.bz2 to Voxforge.

21. Delete file:///home/ubuntu/.kde/share/apps/simon and file:///home/ubuntu/.kde/share/apps/simond.

22. Start simon.

23. I am skipping the next steps. Please read my article German speech model ‘dedq’ to get more details.

24. And now it is time to watch the video about this speech model:

The following words were recognized in the video:

Orakelspruchs Orangensaft Orangensaftes Orangenschale Orangenschalenstruktur Orangenscheibe Orangensekt Orangerie Orangerien Orangerücken Oranienburger Oranienburgs Orchester Orchesterbegleitung Orchesterkanzel Orchestergraben Orchesterbesoldung Orchestermitglieder Orchestermusik Orchestermusiker Orchesterprobe Orchesterraum Orchester Ordensgelübde Ordensschwester Ordensschwestern Orderpapier Ordination Ordnungsbehörde Ordnungsbehörden Ordnungsmacht Ordnungssystem Orffs Organbank Organe Organell Organelle Organells Organhandel Organigramm Organigramme Organigrammen Organigramms Organik Organisationsabteilung Organisationsaufgabe Organisationsaufgaben Organisationsausschuss Organisationsbegabung Organisationseinheit Organisationserfahrung Organisationsfachmann Organisationsform Organisationsformen Organisationsgabe Organisationskomitee Organisationslösung Organisationslösungen Organisationsmethoden Organisationsplan Organisationsplanung Organisationspsychologie Organisationsreform Organisationsstruktur Organisationsteam Organisator Organisatoren Organisators Organisierung Organismus Organist Organistin Organographie Orgelbauer Orchesters Orgelbauers Orgelklang Orgelklangs Orgelkonzert Orgelkonzerte Orgelmusik Orgel Orgelpfeife Orgelton Orgelwerke Orientbrücken Orienthandel Orientierungskrise Orientierungskrisen Orientierungspunkte Orientierungsstufe Ornithologie Origami Originalantwortschein Originalantwortscheine Organigrammen Organstreit Originalausgabe Originalausgaben Originalbeleg Organstreit Originaldiskette Originalersatzteil Originalfassung Originalgehäuse Orgelkonzerte Originalität Organizismus Originalprüfunterlagen Organells Orderscheck Originalschecks Organhandel Organstreitverfahren Originalversion Originalverpackung Originalversion Organ Organstreit Orkanen Organstreit Orkanschadens Orkantiefs Orkantiefs Orlando Orlandos Orléans Ornamentband Ornamentbands Ornamentbänder Ornamentbändern Ornamente Ornaments Orographie Orographien Orpheus Ortbeton Orte Orten Ortens Ortgang Ortgangbrett Orthodoxie Orthodoxien Ostgeschäft Orthographie Orthographiefehler Orthographiefehlern Orthographiefehlers Orthografien Orthographie Ortholexikon Ortholexikons Orthonormalbasis Orthopäde Orthopäden Orthopädie Orthopädien Ortleb Ortolf Ostkirche Ortsbehörden Ortsbesichtigung Ortsbezeichnung Ortsbild Ortschaftsrats Ortschaftsräte Ortschaftsräten Ortsdurchfahrten Ortsfremde Ortsfremden Ortsgebühr Ortsgespräche Ortsgesprächen Ortsgrammatik Ortsgruppe Ortsgruppenleiter Ortskirchen Ostkredite Ortskrankenkassen Orchestern Ortsmitte Ortsname Ortsnamen Ortsnetz Ortsnetze Ortsnetzen Ortsnetzes Ortleb Ortssendern Ortsteil Ortsteilen Ortsteils Orchester Ortsvektoren Ortsverbandes Ortsverzeichnis Ortsveränderung Ortsvorsitzende Ortsvorsteher Ostdeutschland Ortszulage Ortungsgeräte Ortung Ostblock Ostblockes Osteolyse Ostfriesentee Ostallgäu Ostdeutschlands Ost-Berliner Ost-Berlins Ost-SPD Ostwestfale Ost-West-Konflikt Ostafrika Ostafrikas Ostalgie Ostasien Ostbahnhof Ost-Berlins Ostbesuche Ostbesuchen Ostbewohner Ostblock Ostblockländer Ostblockländern Ostblockreisen Ostblockstaaten Ostbündnis Ostbündnisse Ostbündnissen Ostbündnisses Ostdeutschland Ostelbien Ostfront Ostens Osteoporose Osteoporosen Ozeanriesen Osteuropäer Osteuropas Osteuropäern Osteuropäers Ostexport Ostexports Ostfalen Ostfildern Ostfilderns Ostgebiete Ostgeschäft Ostprovinz Ostpreußen Ostpreußens Ostteil Ostsee Ostseebäder Ostseehandel Ostseeheilbad Ostseeinsel Oszillator Oszillatoren Ostwirtschaft Ostsektor Ovation Ovationen

You can see that there are a lot of recognition errors.

25. Now you know how I created the German speech model ‘dedv’, and how good / bad it is when used for recognition.

Fixing the problem with “setxkbmap de”

July 27th, 2011 by producer

Because I have problems with the German special characters like ö and ü, I am trying the following:

Type into the terminal “setxkbmap de” – and then start simon and ksimond.

By the way, my simon version is 0.3.0-1ubuntu8. I installed it a few months ago using this approach.

Yes, the German special characters are displayed correctly. I just dictated: “Mühlingen Müllern Mörike Mörtelgerüche Mühelosigkeit ” – great, it is working now.

German speech model ‘dedq’

July 26th, 2011 by producer

This article shows (A.) how I create the German speech model ‘dedq’, and (B.) how I dictate using this speech model.

A. Creation of the German speech model ‘dedq’

1. Delete file:///home/ubuntu/.kde/share/apps/simon and file:///home/ubuntu/.kde/share/apps/simond

2. Start simon.

3. Click the Vocabulary button. Press the Import Dictionary button. Select Target: Active Dictionary. Type of dictionary: PLS dictionary. The location of the PLS dictionary on my computer is as follows: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-ipa-flac-files-dedq-20110726/lexicon-dedq.xml You can find this dictionary file at Voxforge (37 MB; you can extract the dictionary file). Or here is an easier way: you can download the file dedq.xml (right click; Save Link as). The file dedq.xml is a valid PLS dictionary that you can import into simon.

4. Press the Grammar button. Add sentence: Adjektiv. Add sentence: Substantiv. Add sentence: Zahlwort

5. Click the Training button. Import trainingsdata.
Import prompts:
- Prompts: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-ipa-flac-files-dedq-20110726/prompts-dedq
- Base directory: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-ipa-flac-files-dedq-20110726/flac-dedq

You can get both files from Voxforge (see link above). Please keep in mind that some post-processing has to be enabled.

Go to Settings > Configure simon… > Recordings > Post Processing You can see the post processing command that causes sox to convert the FLAC files to WAV format.

6. Press the Commands button. Manage Plug-ins > Add > Dictation > Dictation > Append text after result: ” ” (enter just a space bar, then press the OK button).

7. Start ksimond. simon > Connect button. simon now starts with the compilation of the speech model. Let’s dictate a few words: “M;nchsfisch Mnchskopf Mhe Mllerin Mllerthal Mllschlucker” The German ö and ü aren’t displayed. Press the Activated button to stop the recognition.

8. Now let’s copy the files of the base model:

a. Copy hmmdefs:

cp /tmp/kde-ubuntu/simond/default/compile/hmm24/hmmdefs /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-speech-model-dedq/hmmdefs-dedq

b. Copy macros:

cp /tmp/kde-ubuntu/simond/default/compile/hmm24/macros /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-speech-model-dedq/macros-dedq

c. Copy stats:

cp /tmp/kde-ubuntu/simond/default/compile/stats /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-speech-model-dedq/stats-dedq

d. Copy tiedlist:

cp /tmp/kde-ubuntu/simond/default/compile/tiedlist /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-speech-model-dedq/tiedlist-dedq

9. It is now time to export the scenario file. Select the option Export to file.

10. You can download the German speech model ‘dedq’.

You now know how I created the German speech model ‘dedq’.

B. The following video demonstrates the dictation / recognition process. Unfortunately, Youtube limits the length to 15 minutes. Take a look at the German speech model ‘dedq’ in action:

These are the words that were recognized in the video:

M;nchengladbach Mnchkloster Mnchsfisch Mnchskopf Mnchskutte Mnchstum Mnsheim Mrderbande Mrtel Mrtelbrett Mrtelgeruch Mckenlarven Mckenschutz Mckenschwarm Mckenspray Mckenstich Mdigkeit Mgeln Mggelturm Mhelosigkeit Mhen Mhle Mhlen Mhlenanordnung Mhlenbedampfung Mhlenbetrieb Mhlendrehvorrichtung Mhleneinsatzdiagramm Mhlenausfallgut Mhlheim Mhlhausen Mhlingen Mhlrad Mhlteich Mlleimer Mllerin Mllers Mllerstrae Mllhaufen Mllpresse Mllhalde Mllhalden Mllhaufen Mllkasten Mllkippe Mllverbrennung Mllverbrennungsanlage Mllzerkleinerer Mnder Mndigkeit Mndungsbremse Mndungsdelta Mndungsdeltas Mndungsfeuer Mndungsfeuerdmpfer Mnsterbau Mnsterbaus Mnsterbauverein Mnsterbauvereins Mnstereifel Mnzer Mnze Mnzenberg Mnzwert Mrrhe Mtze Mtzen Mtzenich Mtzenmacher Mtzenschirm NEUNZEHNHUNDERT Mhlendrehvorrichtung NEUNZEHNHUNDERTACHTZIG Nabelbinde Nabelbruch Nabeln Nabenabdeckung Nabenabstand Nabenbremse Nabendynamo Nachbarplaneten Nachbarstdte Nachbehandelns Nachfolgekandidaten Nachfolgemodelle Nachfolgewert Nachforderungsmanagement Nachforschens Nachfrageinflation Nachfrageintensitt Nachfragelcke Nachfragestruktur Nachhilfekurse Nachhilfekursen Nachladung Nachladungen Nachmittagsschicht Nachrichtensendung Nachrcker Nachrstens Nachschlagebuch Nachschlagewerk Nachschlagewerke Nachschlagewerken Nachschlagewerks Nachschlssel Nachsetzen Nachspeise Nachspeisung Nachspur Nachwuchsausbildung Nachwuchsarbeit Nachwuchsbereich Nachwuchself Nachwuchsfrage Nachwuchskraft Nachwuchsfrderung Nachwuchskrften Nachwuchslufer Nachwuchsmannschaft Nachwuchsschwimmers Nachwuchsspieler Nachwuchsspieler Nachwuchstalent Nachwuchsteam Nachzgler Nachzglers Nachzndung Nadelhlse Nahhandel Nahhandels Nahrungsbestandteil Nahrungsmittelknappheit Nahrungsmittelkonserve Nahrungsmittelvergiftung Nahrungsreserven Nahrungsvorrte Naivitt Naivling Naivlinge Naivlingen Naivlings Namensaktien Namensaufruf Namensobligation Narkoseschwester Namensnderung Nastassia Natalia Natalie Natangen Natascha Nathan Nation Nationalarmee Nationalbank Nationalcharakter Nationalchina Nationaldemokratische Nationaleinkommen Nationalelf Nationalfarben Nationalfeiertag Nationalfeind Nationalflagge Nationalfriedhof Nationalgalerie Nationalgetrnk Nationalheld Nationalhelden Nationalheldin Nationalhymne Nationalinstitut Nationalismus Nationalisten Nationalitt Nationalitten Nationalittenkampf Nationalkongress Nationalkommunist Nationalkasse Nationalkonvent Nationalmannschaft Nationalmannschaften Nationalmuseum Nationalpartei Nationaltracht Nationalspiel Nationalspielerin Nationalspielerinnen Nationalspielers Nationalsport Nationalsprache Nationalsynode Nationalteam Nationaltheater Nationaltracht Nationaltruppen Nationalverband Nationalversammlung Nationalversammlungen Nationalwerksttten Natomitglieder Natrium Natriumbikarbonat Natriumnitrit Natriumnitrat Natriumkarbonat Natriumhydroxid Natriumphosphat Natriums Natriumsilikat Natriumstearat Natriumsulfat Naturalabgabe Naturalabgaben Naturaleinkommen Naturaleinkommens Naturalersatz Naturalpacht Naturalwirtschaft Naturanlagen Naturbedingung Naturereignis Naturerlebnis Naturfreund Naturgeschmack Naturgesetz Naturgesetze Naturgesetzen Naturgesetzes Naturgewalt Naturgre Naturgren Naturheilkunde Naturheilkundige Naturkosmetik Naturkautschuk Naturkunde Naturlehrpfad Naturmenschen Naturphilosophie Naturprodukt Naturprozess Naturprozesse Naturprozessen Naturprozesses Naturraum Naturrecht Naturreich Naturschauspiel Naturschilderung Naturschilderungen Naturschutz Naturschutzbund Naturschutzes Naturschutzexperte Naturschutzgebiet Naturschutzgebiete Naturschutzgesetz Naturschutzpark Naturschutzparks Naturschutzstelle Naturschutzbund Naturschden Naturschtze Naturschnheit Naturschtzer Naturschtzern Naturseide Naturstein Natursteinpflasterbelag Natursteinpflasterbelags Natursteinpflasterbelge Natursteinpflasterbelgen Natursteinverkleidung Naturstrand Naturzement Nauheim Nauheims Naumburg Naumburgs naturwissenschaftliche