Posts Tagged ‘Vietnamese’

Ralf’s Vietnamese speech model

Friday, May 18th, 2012

Some words about the creation of this speech model:

1. Download Ralf’s Vietnamese dictionary 0.1.1.
2. New “Vietnamese” scenario. Import the dictionary as shadow dictionary.
3. Train 10 words:

ang, chùng, cuốn, gắm, hộp, khem, lếch, ngỗng, thoang, vung

4. Grammar Unknown. Dictation plugin. Synchronize. Activate. Dictate:

ang vung vung gm hp gm

There are specific letters missing. I tried to fix that by changing the keyboard layout. But it didn’t help.

5. Get Ralf’s Vietnamese speech model.


Ralf’s Vietnamese dictionary 0.1.1

Monday, May 24th, 2010

Let’s improve Ralf's Vietnamese dictionary:

1. Convert eSpeak phonemes into IPA phonemes:

$ cat '/media/5f6432a3-9a68-45ee-b4b7-11f3b009825a/home/am3msi/Documents/200911/vietnamese/dictionaries/vietnamese-dictionary.xml.bz2' | bunzip2 -k | saxonb-xslt -ext:on -s:- -xsl:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/espeak2ipa.xsl'

2. Download Ralf's Vietnamese dictionary, and import it into simon.

Vietnamese Hanoi dictionary (HTK)

Friday, December 11th, 2009

I just imported a Vietnamese Northern (Hanoi) dialect dictionary. You can download the dictionary. I imported it as HTK lexicon. This is the result:


It looks fine. Here is a small excerpt:

mà [mà] m aa2
má [má] m aa3
mả [mả] m aa4
mã [mã] m aa5
mạ [mạ] m aa6
ma [ma] m aa7

If you speak the Vietnamese language, you should get the concept. The different a-vowels are different phonemes (aa2, aa3, aa4, aa5, aa6, aa7). This approach should be OK.

It would be nice if a native speaker would try to record a few Vietnamese words with simon:


It would be interesting to know whether it works since Vietnamese is completely different from English. I recommend that you try to record 10 different Vietnamese words with simon (each word 8 times).

Ralf’s Vietnamese dictionary

Sunday, November 15th, 2009

You can import Ralf's Vietnamese dictionary (version 0.1; GPLv3) into simon. The dictionary contains about 6.000 words; training is not possible. The phoneme elements contain eSpeak phonemes (not IPA phonemes).