I want to answer some comments that were made in this blog, and add some further thoughts:
1. comment: OK, I understand how to change REGRETTING to lower-case. But my goal is a speech model with 1000 words. And it would be too much work to change that manually. Of course, this is not a simon issue. It is a dictionary issue that should be fixed by someone.
2. comment: I don’t know where to find khotnewstuff3 and libattaca. You could help me by updating the requirements section in the simon wiki. I need to know which packages I should install with sudo apt-get install. These are the answers in the Ubuntu terminal:
$ khotnewstuff3
No command 'khotnewstuff3' found, did you mean:
Command 'khotnewstuff' from package 'kdelibs4c2a' (main)
So I assume that I have to install khotnewstuff.
$ libattaca
libattaca: command not found
I don’t know where to find this package (even Google doesn’t help). Obviously, this command / package doesn’t exist.
Maybe I should try the following:
sudo apt-get install khotnewstuff
./build_ubuntu.sh
I will try that later, not now.
Thanks for the info about trunk / scenarios / no-scenarios. I will continue to use trunk (and forget about scenarios).
3. comment: Yes, I figured that out on my own that I can delete both folders: /home/ubuntu-64bit/.kde/share/apps/simon and /home/ubuntu-64bit/.kde/share/apps/simond. But I couldn’t restore my old model (with >200 German words) even though I have all necessary files available (53MB; link will become invalid soon). A tutorial would be helpful that explains each step that has to be done to restore an old model (that has been backed up while it was working). I tried a lot of things, but always something went wrong.
I want to make an example: My copy of Dragon NaturallySpeaking Preferred offers export/import functionality. This is great. I can be sure that it is possible to restore my own adapted speech model.
And what about simon? Momentarily, simon just offers a dysfunctional Import Trainingsdata function. This is a very weak part because I want to train a model with 1000 German words. And I want to use sam to find poor wav files (and to sort them out). sam seems to be great for finding/fixing wav issues. But the combination simon / sam has to work which unfortunately isn’t the case. So if something goes wrong, I want to be able to restore my 580 German wav files (and there will be much more wav files in the future).
simon is now OK for me for recording because my USB sound card now works with simon / 16.000 Hz.
Backup/restore with simon should become much easier than it is now. If you have a vocabulary of just 10 words, you don’t need backup/restore because you can start from scratch without loosing much time. But when you have a bigger vocabulary (>200 words), the user doesn’t want to start from scratch. There has to be some way to make it work.
In my opinion, a future version of simon should have an Export speech model button. The export file could be a .tar.gz file that contains all files that are included in /home/ubuntu-64bit/.kde/share/apps/simon/model. And with an Import speech model button this .tar.gz file could be imported. This would make things so much easier.
Or I need a tutorial which explains what exactly I have to do. Because you can see when reading my posts from the last couple of days, that I tried a lot of different things to restore my speech model, but I failed.
4. comment: OK, if I delete the folder /home/ubuntu-64bit/.kde/share/apps/simond, then I delete the file simond.db (which I assume contains information about the username and the password).
If I delete /home/ubuntu-64bit/.kde/share/apps/simond/models, the file simond.db will remain available which means I don’t have to add a user with username and password.
It is good to know about these details because I want to have as much control over the computer as possible.
“document some of your tests on the simon wiki too” – I will think about it. I had written a couple of articles in the simon wiki that were deleted (e.g. an article about PLS). In my opinion, these articles are important for on-line marketing of simon. Information can be 2-3 times redundant. Of course, I understand that the simon team wants to keep the simon wiki lean.
By the way, the marketing of open-source ASR is pretty bad:
Most CMU Sphinx websites are outdated. The problems with the one at sourceforge are:
* Not so modern style
* No interactivity
* Loosely organized outdated information
* Hard to manage/update
* No CMS/search
Also there is a generic problem with the quality of documentation available. A lot is quite outdated and just confusing.
This is exactly my impression of CMU Sphinx. I don’t know where to start: Sphinx-2, Sphinx-3, Sphinx-4, or pocketsphinx? Here is what I want: A system that recognizes words that I speak. So I want one solution for my problem. I tried this Sphinx tutorial. But I failed. I need a tutorial that I can finish within 20 minutes. If it takes longer, I loose interest.
My idea is to reduce the complexity. E.g. you should be able to import Ralf’s German dictionary within 20 minutes. Imagine the following situation: simon is installed, you want to import Ralf's German dictionary. You should be able to achieve this simple goal within 20 minutes. You have success. I want to help people to become successful. This is the reason why I started this blog. I want that the people benefit as fast as possible.
Or let’s assume that you are an average Windows XP user, and you want to know whether you can use Ubuntu to recognize your spoken German words. You don’t have to install Ubuntu. You don’t have to install simon. You don’t have to know about Ralf's German dictionary. All you have to do: Download the video Recognize 200 German words, and watch this 13 minutes video on your Windows XP computer. Easy, isn’t it? You can learn from me where we are. You don’t loose much time. You can decide when you want to migrate from Windows XP to Ubuntu. I recommend that normal users stay with Win XP. And what about gouvernments, big companies? They should consider to migrate to Ubuntu during the next couple of years. ASR for Ubuntu should be usable in let’s say five years or so.
Of course, the development could be faster if the governments wouldn’t fight against the internet instead of supporting it. E.g. the private households in Germany could use fiber to the home (fiber into every house; inside the house you use WLAN or copper cable). The result would be that we could use simon / simond across the internet via TCP/IP. We don’t need better computers. We need better infrastructure.
Interested people (average computer users who hardly know how to handle the Ubuntu terminal) need some kind of entertainment: Sometimes, they want to lean back, and just consume. I am trying to offer this kind of entertainment: You can watch my 13 minutes video where I am showing you that it is possible to dictate more than 200 German words under Ubuntu with 100% accuracy. This video is not scientific. It is easy information about what is possible when you install the following components:
- simon,
- HTK,
- import Ralf's German dictionary into simon, and train a word with simon.
The open-source ASR community does have one major problem: the insufficient marketing. People want to have one solution that works out of the box. So we have to make it as easy as possible (if we want to involve more people in the development of open-source ASR software). 10.000 users = one developer. So if we get more users into the system, we might get a few developers.
I want to say what we need:
1. Someone who has C++/Qt knowledge and who knows how to handle Sphinx commands. This person could get the simon source code, and build a simon fork. This simon fork should handle Sphinx commands instead of HTK commands. I understand what has to be done. But I can’t solve this problem on my own because I don’t have the necessary knowledge. It shouldn’t be that hard for someone who is experienced with C++/Qt/Sphinx to solve this problem.
2. We need pronunciation dictionaries for several languages. The ideal format is the Sphinx format (for non-German dictionaries). I am telling you the easiest solution. Why Sphinx format? Because if you have a Sphinx dictionary, you can
- use this dictionary directly with Sphinx;
- import this Sphinx dictionary into simon.
The import of a dictionary in Sphinx format into simon should be possible without problems. This is what should be done:
a. A German native speaker could get involved in the development of Ralf's German dictionary (and similar dictionaries with German pronunciation). My concept is clear (XML + XSLT = PLS). I can’t do all the work. We need specialised dialect dictionaries (Austrian German, Swiss German, Medical German). I made the first step with the creation of Ralf's German dictionary, Ralf's Austrian German dictionary, and Ralf's Medical German dictionary.
b. Is there a person from Austria who wants to improve Ralf's Austrian German dictionary? A linguist who knows about the details of the IPA phonetics could do this. Of course, you don’t have to be a linguist. You can do learning by doing. Learn about the IPA phonemes that are used by Austrian German speakers by improving Ralf's Austrian German dictionary!
c. Maybe there is someone who studies medicine, who wants to use simon for his studies of medicine. The headset will become a tool that will be used by many doctors. Dictionaries with special medicine vocabulary are necessary. Ralf's Medical German dictionary is a start. To achieve this goal, help from a student of medicine is needed.
I want to tell you why the PLS is a great thing:
- no problem to handle right-to-left languages like Hebrew;
- no encoding problems thanks to UTF-8 (OK, I have encoding problems with Ralf’s Polish dictionary);
- the GPLv3 licence can be added at the beginning of the XML document (as comment). There is no need to provide an additional text file containing the GPLv3 license. There is no room for any licensing misunderstanding. XML documents do have a lot of verbosity. But modern computers can handle this verbosity.
- they are easy to edit by humans. E.g. linguists who are familiar with IPA can edit the PLS dictionary without problems. The IPA is much easier to read than Arpabet.
At the moment, my concept is as follows:
- PLS format for German dictionaries.
- Sphinx format for non-German dictionaries. The non-German dictionaries that I am offering in this blog should be transformed from PLS/eSpeak format to Sphinx/Arpabet format. After the dictionary has been converted into Sphinx format (this should be done by a native speaker), you can import it into simon.
I want that simon becomes a solution that you can use with the following languages: Icelandic, Vietnamese, Russian, Norwegian, and many more languages. Help from native speakers is needed. You can transform the PLS dictionary of your language into Sphinx format. E.g. let’s say you are from Russia. Then you can transform Ralf's Russian dictionary into Sphinx/Arpabet with a simple text editor. After you have done that, you can import the dictionary into simon.