To cut it short: the biggest issue of open source ASR projects is documentation:
“The biggest issue with sphinx4 is actually documentation. Current poll on CMUSphinx website clearly shows that. Personally I sometimes think that perfect documentation will not help if system doesn’t work, but at least it will make product attractive and easy to use. My idea is that we need to have more developer-level documentation – tutorial, examples, task-oriented howtos. It’s unlikely we’ll be able to write something that is good enough as textbook on speech technologies. But we need to prove the point that it’s possible to build ASR system without understanding who is Welch.”
1. Do you get it? Even if the system doesn’t work, the product will be more attractive! Most of my PLS dictionaries don’t work. But: You can import each PLS dictionary into simon. You can see that
- simon offers PLS import;
- each PLS dictionary can be imported into simon.
The result is that simon is more attractive. It is no problem if the quality of almost all PLS dictionaries currently is too bad for training.
So here is what I am doing: I am using this web-log to document the development of my PLS dictionaries.
2. “we need to have more developer-level documentation – tutorial, examples, task-oriented howtos” – from my point of view, the simon developers should invest more time and energy in explaining what they are doing. E.g. the simon developers improved the import process for very big dictionaries (better performance), but I had to find out on my own. Please, invest more time in developer-level documentation. Explain what you are doing.
3. “build ASR system without understanding who is Welch” – exactly this is true. I don’t understand the Baum–Welch algorithm. I need tutorials that explain how I can build my own ASR system. There are tutorials, but almost always at some point I failed. E.g. I tried to follow the Voxforge tutorial. I couldn’t finish all steps because at some point something went wrong.
Here is what I want:
- record some wav files (with Audacity or with simon);
- generate the HTK speech model (with simon; with sam; via the Ubuntu terminal);
- being able to backup/restore my speech model with simon (I don’t want to lose my work = the wav files that I recorded with simon).
It is not necessary to get perfect recognition results. It is necessary to have a software that is user-friendly (with import/export function).
simon offers scenarios. Where is a documentation about the simon scenarios? Is there a PDF, a video tutorial, a blog entry available that explains the advantages of the scenarios?
I completely agree that documentation is indispensable. However what is often overlooked is that documenting is very time consuming and that time is going to be missing somewhere else.
So if you say I should document even small changes like improving the handling of larger dictionaries (which was mainly a by-product of the new and more efficient scenario code) than I would have actually been to busy to implement it because I would spend at least 50 % of my time documenting.
As you have seen with simon 0.2 I will document the stable version of simon and provide video demonstrations of it’s features. This is also planned for the 0.3 release.
In the mean time, to answer your question:
Here is the blog entry explaining scenarios including a video demonstration:
http://simon-listens.blogspot.com/2009/12/simon-scenarios.html
Greetings,
Peter