This article explains how I create the dictionary, and how the imported result looks like in simon.
A. Creation of the PLS dictionary:
1. Get spelling dictionary.
2. License is GPL. It says in the file README_en.txt:
This spell check dictionary for Interlingua is licensed under GPL. [...] This hyphenation rules for Interlingua are licensed under GPL.
This means that I can use this spelling dictionary as source.
3. Extract dict-ia-2010-11-29.oxt.
4. ISO 639-1 language code is ia.
5. Probably I will use this table for grapheme to phoneme conversion.
6. Check the encoding of ia_iso.aff and ia_iso.dic. Both files are encoded in ISO 8859-1. Probably it is best if I convert the encoding of both files into UTF-8. iconv -f ISO-8859-1 -t UTF-8 < ia_iso.dic > interlingua-utf8.dic
iconv -f ISO-8859-1 -t UTF-8 < ia_iso.aff > interlingua-utf8.aff
Change the first line in interlingua-utf8.aff into SET UTF-8. Both files contain CRLF at the end of each line (Windows mode). I don’t know whether this is ok with the unmunch command. I will check it out:
Obviously, it worked. The CRLF is part of the source files. The target file contains just a LF (Unix mode). There are a lot of duplicate entries. I think that these duplicate entries will be removed later by an .xsl script.
7. Add lexicon tags at the beginning and the end of interlingua-wordlist.
The left column contains the words. The pronunciation column contains the corresponding SAMPA transcriptions. The Category column contains just “Unknown” entries.
Now you know how I created the dictionary and how the result looks like in simon.
This article explains the creation of an Arabic PLS dictionary and how the result looks like in simon.
A. Creation of the dictionary:
1. Get Arabic spelling dictionary.
2. Check the license. Inside the file dict_ar-3.0.oxt there is a file with the name COPYING (in the docs folder). It says in the file:
GPL 2.0/LGPL 2.1/MPL 1.1 tri-license
This means that I can use this tri-licensed spelling dictionary as source for my future GPLv3 PLS dictionary.
3. Now I have to extract dict_ar-3.0.oxt.
4. Let’s try the unmunch command inside the Ubuntu terminal:
The left column contains 457089 Arabic words. The pronunciation column contains the corresponding SAMPA transcriptions. The third column contains just entries with “Unknown”. This is because the PLS dictionary contains no role attributes.
Now you know how I created the dictionary. And you know how the result looks like in simon.
In 2009, I made some initial tests with Hebrew. Now it is time to develop a Hebrew PLS dictionary that is much bigger than the sample dictionary from 2009 (which I have deleted). This article explains how I create the dictionary, and how the result looks like when imported into simon.
A. Creation of the dictionary:
1. Get Hebrew spelling dictionary from OpenOffice.org.
2. License is GPL. There is a copyright notice inside the file he_IL.aff.
3. I tried to unmunch the dictionary in the Ubuntu terminal, but unfortunately I failed:
4. The source file he_IL.dic contains a lot of numbers. I remove them with the Ubuntu terminal:
ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ sed 's/[0-9]*//g' he_IL.dic > hebrew-without-numbers
With Geany, I remove the “,” (commas) and the “/” (slashes) that still are included within in the file hebrew-without-numbers. Now I have a clean word list with 43.000 Hebrew words.
5. Add lexicon tags at the beginning and the end of hebrew-without-numbers.
6. Ubuntu terminal:
7. ISO 639-1 language code is he.
8. I need a table for grapheme to phoneme conversion. Maybe I will use this table. There are several tables available at Wikipedia. I am not sure which one I should use. I have an idea: as far as I know, Yiddish and Hebrew share the same alphabet. This means I could try to use the Yiddish improve-yiddish.xsl style sheet:
The result is that most Hebrew letters have been converted into IPA. There is only one Hebrew letter that hasn’t been converted: [א] I will add this phone to the .xsl style sheet with the name improve-hebrew.xsl. Now I try it again:
The result is not so good: Maybe I should adjust the grapheme to phoneme conversion rules for modern standard Israeli Hebrew. Or is this not necessary? I think for a first draft I can use the Yiddish transformation rules.
B. Download the dictionary. Import it into simon as shadow dictionary.
Take a look at the result: The left column contains 43933 Hebrew words. The pronunciation column contains the corresponding SAMPA transcriptions. The category column is unemployed (or to be more exact: displays just Unknown) since the source PLS dictionary contains no role attributes.
Now you know how I created the dictionary. And you know how the result looks like in simon. This dictionary uses more or less Yiddish pronunciation because I was too lazy to adjust it to modern standard Israeli Hebrew. It shouldn’t be a problem to adjust the style sheet improve-hebrew.xsl so that the phoneme results are better.
This article explains some details about the creation of the dictionary, and how the result looks like in simon.
A. How I create Ralf's Yiddish dictionary:
1. Get spelling dictionary.
2. License is GPLv3.
3. Extract jidysz.net.ooo.spellchecker.oxt.
4. Ubuntu terminal: cd /home/ubuntu/Documents/2011-II/Yiddish/dictionaries
sudo apt-get install hunspell-tools
unmunch yi.dic yi.aff > yiddish-wordlist
5. Add <lexicon> at the beginning of yiddish-wordlist. Add </lexicon> at the end of this file.
6. Generate .xml document with lexicon, lexeme and grapheme elements:
Take a look at the result. The left column contains the Yiddish words. This dictionary contains 99980 words. The right column contains the corresponding SAMPA transcription. Yiddish is written in the Hebrew alphabet. The Hebrew alphabet is written from right to left. Obviously, the corresponding SAMPA transcriptions are written from left to right. This means that the phoneme order should be fine.
There are a lot of other PLS dictionaries available. Find the PLS dictionary that suits your language.
4. Let’s try and check the option Also import unknown sentences. I don’t know whether this is a good decision. So let’s give it a try.
This is interesting: “words with more than one terminal” – is it now possible to use more than one entry for the role attribute? The current version of Schott’s German dictionary employs just one entry for each role attribute. The PLS standard allows more entries.
Please, download and extract Schott’s German utterances. This compressed folder 15000-german-utterances.zip contains a plain text file with more than 15000 utterances. I am the author of these utterances, and I have licensed them under the GPLv3. The utterances are designed to be used in conjunction with Schott’s German dictionary (former name: Ralf’s German dictionary). Every word that is included within Schott’s German utterances should be included in Schott’s German dictionary, too. I am 99% sure that this is the case, but I can’t guarantee it. If some words are missing in Schott’s German dictionary, please inform me, and I will include them within the next version of Schott’s German dictionary.
You can import my German utterances using simon’s Import Text option (copy & paste).
5. The import has been completed. There are a lot of lines that contain the Unknown terminal. Probably it would have been better if I wouldn’t have checked the option Also import unknown sentences in step 4.
6. Because simon didn’t react any more, I forced it to quit. I tried to start simon several times, but it wouldn’t start. Now there are several simon zombie statuses displayed. I was able to end these zombie / sleeping processes. But at the moment, it seems to be impossible to start simon again (a new zombie / sleeping status is beeing created if I try to start simon again).
Conclusion: I don’t recommend to check the option Also import unknown sentences. I tried the import Grammar function before without checking this option. Simon reacted normal, everything seemed to be fine.
Yesterday, I installed Qwt 6, and then built simon 0.3.60. It was difficult, but in the end it worked out fine. And look, sam offers now an Export test result button (top right of the screen shot):
I want to export the following information: Filename, Expected result, Actual result, Recognition rate (below 50%). The resulting document should be a simple text file (or XML file or whatever). Is this possible with the current Export test result function of simon?
3. System > Administration > Synaptic Package Manager:
Remove the package “simon” (Mark for Complete Removal).
simon is now not visible any more in Synaptic. So obviously, it has been completely removed.
ubuntu@ubuntu:~/Documents/2011-II/speech2text$ ./build_ubuntu.sh
– The C compiler identification is GNU
– The CXX compiler identification is GNU
– Check for working C compiler: /usr/bin/gcc
– Check for working C compiler: /usr/bin/gcc — works
– Detecting C compiler ABI info
– Detecting C compiler ABI info – done
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ — works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info – done
CMake Error at cmake/FindZLIB.cmake:25 (MESSAGE):
Could not find ZLIB
Call Stack (most recent call first):
julius/libsent/CMakeLists.txt:3 (find_package)
– Configuring incomplete, errors occurred!
touch: cannot touch `./julius/gramtools/mkdfa/mkfa-1.44-flex/*’: No such file or directory
ubuntu@ubuntu:~/Documents/2011-II/speech2text$
6. Question: What do I have to do to get simon going from git repository?
ubuntu@ubuntu:~/Documents/2011-II/speech2text$ ./build_ubuntu.sh
– Found Portaudio: /usr/lib/libportaudio.so
– Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so
– Found Pthreads: /usr/lib/x86_64-linux-gnu/libpthread.so
– Looking for Q_WS_X11
– Looking for Q_WS_X11 – found
– Looking for Q_WS_WIN
– Looking for Q_WS_WIN – not found.
– Looking for Q_WS_QWS
– Looking for Q_WS_QWS – not found.
– Looking for Q_WS_MAC
– Looking for Q_WS_MAC – not found.
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so;/usr/lib/x86_64-linux-gnu/libXau.so;/usr/lib/x86_64-linux-gnu/libXdmcp.so
– Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so;/usr/lib/x86_64-linux-gnu/libXau.so;/usr/lib/x86_64-linux-gnu/libXdmcp.so – found
– Looking for gethostbyname
– Looking for gethostbyname – found
– Looking for connect
– Looking for connect – found
– Looking for remove
– Looking for remove – found
– Looking for shmat
– Looking for shmat – found
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Looking for include files CMAKE_HAVE_PTHREAD_H
– Looking for include files CMAKE_HAVE_PTHREAD_H – found
– Looking for pthread_create in pthreads
– Looking for pthread_create in pthreads – not found
– Looking for pthread_create in pthread
– Looking for pthread_create in pthread – found
– Found Threads: TRUE
– Looking for _POSIX_TIMERS
– Looking for _POSIX_TIMERS – found
– Found Automoc4: /usr/bin/automoc4
– Found Perl: /usr/bin/perl
– Found Phonon: /usr/include
– Performing Test _OFFT_IS_64BIT
– Performing Test _OFFT_IS_64BIT – Success
– Performing Test HAVE_FPIE_SUPPORT
– Performing Test HAVE_FPIE_SUPPORT – Success
– Performing Test __KDE_HAVE_W_OVERLOADED_VIRTUAL
– Performing Test __KDE_HAVE_W_OVERLOADED_VIRTUAL – Success
– Performing Test __KDE_HAVE_GCC_VISIBILITY
– Performing Test __KDE_HAVE_GCC_VISIBILITY – Success
– Found KDE 4.6 include dir: /usr/include
– Found KDE 4.6 library dir: /usr/lib
– Found the KDE4 kconfig_compiler preprocessor: /usr/bin/kconfig_compiler
– Found automoc4: /usr/bin/automoc4
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found libsamplerate: /usr/lib/libsamplerate.so
– Found ALSA: /usr/lib/libasound.so
– Enabling resample support
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Enabling simon scenario support.
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Could NOT find KdepimLibs (missing: KdepimLibs_CONFIG) (Required is at least version “4.5.60″)
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
– Found Qt-Version 4.7.2 (using /usr/bin/qmake)
– Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
CMake Error at cmake/FindQwt6.cmake:101 (MESSAGE):
Could not find Qwt 6.x
Call Stack (most recent call first):
sam/src/CMakeLists.txt:1 (find_package)
– Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found. Stop.
ubuntu@ubuntu:~/Documents/2011-II/speech2text$
1. Open the file de-dv. It contains a list of 1000 words (only the phonetic transcriptions).
2. Start Audacity.
3. Now I read every word in the list. Between each word, there is a pause of 1-2 seconds. Later, Audacity will find the pauses automatically.
4. Mark the whole recording with a double-click.
5. Then select Analyze > Sound Finder…
6. Set the Label starting point to 1.0. Set the Label ending point to 1.0. Why? Because simon has to know the amount of background noise.
7. Let’s eliminate the error at position 237. There was some noise (above 26 dB, I think) at position 237, and not a word. Mark the area with the mouse. Then press the Silence button (see top right of the picture).
8. Let’s have a look at the text file de-dv, and at the audio file. The text file ends with line 841. The Audacity audio file ends with number 841. Both files correspond to each other.
9. Select Audacity > File > Export Labels… The position (starting point and ending point) of each label will be exported into a simple text file. Export the labels to a file named labels.txt.
10. Open labels.txt with Geany. The file labels.txt ends at line 841. The first number of each line indicates the label starting point. The second number indicates the label ending point. The third number indicates the label itself.
You can see in the picture that de-dv is open, too. Both files – labels.txt and de-dv – have a length of exactly 841 lines.
11. Geany > Search > Replace.
Search for: \t\w+$ (t means tab; w means alphanumeric character; + means this: “The plus sign indicates that there is one or more of the preceding element”; $ means end of line)
Don’t forget to mark Use regular expressions.
This procedure removes the third number from each line.
12. You can see that the third column has been removed thanks to the regular expression procedure.
13. Now it is time to merge both files: labels.txt should be merged with de-dv. This is done via the paste command in the Ubuntu terminal:
14. You can see that the document pasted.txt has a third column: The labels are the phonetic transcriptions!
15. Now let’s go back to Audacity > File > Import > Labels… Take a look at the result. Each label is a phonetic transcription of the corresponding recording.
16. Audacity > File > Export Multiple…
Export format: FLAC files
Export location: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/flac-dedv
Split files based on: Labels
Name files: Using Label/Track Name
Press the Export button.
The result is a PLS dictionary at the following location: file:///media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/lexicon-dedv.xml
19. Now I need a prompts file. This is generated, too, via Ubuntu terminal:
20. Now it is time to upload the package file:///media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedv/german-ipa-flac-files-dedv-20110727.tar.bz2 to Voxforge.
21. Delete file:///home/ubuntu/.kde/share/apps/simon and file:///home/ubuntu/.kde/share/apps/simond.
22. Start simon.
23. I am skipping the next steps. Please read my article German speech model ‘dedq’ to get more details.
24. And now it is time to watch the video about this speech model:
Type into the terminal “setxkbmap de” – and then start simon and ksimond.
By the way, my simon version is 0.3.0-1ubuntu8. I installed it a few months ago using this approach.
Yes, the German special characters are displayed correctly. I just dictated: “Mühlingen Müllern Mörike Mörtelgerüche Mühelosigkeit ” – great, it is working now.
This article shows (A.) how I create the German speech model ‘dedq’, and (B.) how I dictate using this speech model.
A. Creation of the German speech model ‘dedq’
1. Delete file:///home/ubuntu/.kde/share/apps/simon and file:///home/ubuntu/.kde/share/apps/simond
2. Start simon.
3. Click the Vocabulary button. Press the Import Dictionary button. Select Target: Active Dictionary. Type of dictionary: PLS dictionary. The location of the PLS dictionary on my computer is as follows: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-ipa-flac-files-dedq-20110726/lexicon-dedq.xml You can find this dictionary file at Voxforge (37 MB; you can extract the dictionary file). Or here is an easier way: you can download the file dedq.xml (right click; Save Link as). The file dedq.xml is a valid PLS dictionary that you can import into simon.
5. Click the Training button. Import trainingsdata. Import prompts:
- Prompts: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-ipa-flac-files-dedq-20110726/prompts-dedq
- Base directory: /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/german-0.2.5/object/split/dedq/german-ipa-flac-files-dedq-20110726/flac-dedq
You can get both files from Voxforge (see link above). Please keep in mind that some post-processing has to be enabled.
Go to Settings > Configure simon… > Recordings > Post Processing You can see the post processing command that causes sox to convert the FLAC files to WAV format.
6. Press the Commands button. Manage Plug-ins > Add > Dictation > Dictation > Append text after result: ” ” (enter just a space bar, then press the OK button).
7. Start ksimond. simon > Connect button. simon now starts with the compilation of the speech model. Let’s dictate a few words: “M;nchsfisch Mnchskopf Mhe Mllerin Mllerthal Mllschlucker” The German ö and ü aren’t displayed. Press the Activated button to stop the recognition.
You now know how I created the German speech model ‘dedq’.
B. The following video demonstrates the dictation / recognition process. Unfortunately, Youtube limits the length to 15 minutes. Take a look at the German speech model ‘dedq’ in action:
These are the words that were recognized in the video:
You can import the German speech model 'friedrich' (0.5 MB, GPLv3) into simon. The package contains all necessary files: hmmdefs-friedrich, tiedlist-friedrich, macros-friedrich, stats-friedrich, and of course the scenario-file scenario-friedrich.xml.
Do you know how to import the Latin speech model 'xaf' into simon? No? Then read this article, and take especially a look at this screen shot:
Short explanation: the Latin speech model 'xaf' contains the files hmmdefs-xaf, tiedlist-xaf, macros-xaf, and stats-xaf. You have to set the correct paths to these files.
Then, you have to use the manage scenario function.
I hope that there is someone out there who tries to import the Latin speech model 'xaf' into simon. My recognition results were poor. But never mind, these are just the first steps.