Is eSpeak good or bad?
“Espeak with it amazingly bad speech synthesis quality and even more amazing popularity. Out-of-date synthesis method doesn’t let it be good with any possible modifications.”
I used eSpeak for the creation of my 27 PLS dictionaries (the phonemes were created with the help of eSpeak). I found out that the phoneme quality for German isn’t that bad. It is usable for speech recognition after I made some adjustments with an XSLT style-sheet.
What about the other languages? To be honest: at the moment, I don’t care. I need the 27 PLS dictionaries mainly for propaganda. It is necessary to involve more people in the development of an open source ASR solution.
A Polish native speaker wants to dictate in the Polish language. Or another user wants to dictate in the Vietnamese language. Or someone wants to dictate in the Greek language. These people could take advantage of PLS dictionaries in their own languages.
This is what I want to do: Build a PLS dictionary in a Chinese language (e.g. Cantonese – eSpeak offers this language as provisional language). I need a GPL word list with Cantonese words. But I didn’t find one in the internet (the description of this word list should be in English because I don’t understand Cantonese).
Is eSpeak’s synthesis method out of date? I don’t know. At least eSpeak creates phonemes that I can implement in my PLS dictionaries. Is there a program available that produces better results than eSpeak? The program has to work out of the box. I can use eSpeak by simply typing “espeak” into the Ubuntu terminal. And eSpeak can interpret SSML mark-up. That worked fine for me.
My PLS dictionaries are in an early state of development. It should be possible to increase the quality substantially with the help of some engaged native speakers.
Things have to work. To be more precise: it should be possible for the user to import a PLS dictionary in his own native language into simon. I made a start by offering 27 PLS dictionaries. At the moment, I am thinking about whether I should offer much more PLS dictionaries. The problem is: I don’t know how I can create the phonemes for the specific language. I will find a work-around for this problem.
Which kind of phonemes should the PLS dictionary contain? There are several possibilities:
- IPA phonemes (like in Ralf's German dictionary), advantage: can easily be edited by linguists;
- eSpeak phonemes (like in Ralf's Polish dictionary), advantage: I didn’t introduce new errors by trying to convert them into IPA;
- SAMPA phonemes (none of my dictionaries uses SAMPA), I don’t see any advantage at the moment.
In my opinion, a good phoneme quality can be achieved by using IPA phonemes. Because IPA phonemes are easy to read by linguists. So what can you learn from this post? If you are a native speaker of Vietnamese, Polish, Greek, you may want to take a closer look at Ralf’s Vietnamese / Polish / Greek dictionary, and think about what you can do to improve the quality.
Which advantage offer Ralf’s PLS dictionaries? They show you a way to make speech recognition work for your native language. As soon as you have a PLS dictionary with acceptable quality for your own language, you can think about using it for training with simon.
You can learn from my blog that you can import
- Ralf's Vietnamese dictionary,
- Ralf's Polish dictionary,
- Ralf's Greek dictionary
into simon. So simon is the target application. If you improve the quality of the specific dictionary, there is a chance that it might work.
And there is another thing that I found out when building / importing each of these 27 PLS dictionaries: the dictionary size should be about 100.000 words (not 1 million words, not 10.000 words). Help is needed to implement a good compression algorithm (like unmunch for OpenOffice.org dictionaries).
The focus should be to
- improve the quality of each PLS dictionary – native speakers should do that;
- integrate an option into simon to automatically download & import each of these PLS dictionaries into simon;
- think about a good compression algorith for PLS dictionaries (like unmunch) – languages like Spanish, Dutch, German, Latin need such a compression algorithm – not necessary for English.
Tags: PLS