Posts Tagged ‘Vietnamese’

Ralf’s Vietnamese speech model

Friday, May 18th, 2012

Some words about the creation of this speech model:

1. Download Ralf’s Vietnamese dictionary 0.1.1.
2. New “Vietnamese” scenario. Import the dictionary as shadow dictionary.
3. Train 10 words:

ang, chùng, cuốn, gắm, hộp, khem, lếch, ngỗng, thoang, vung

4. Grammar Unknown. Dictation plugin. Synchronize. Activate. Dictate:

ang vung vung gm hp gm

There are specific letters missing. I tried to fix that by changing the keyboard layout. But it didn’t help.

5. Get Ralf’s Vietnamese speech model.

̀

Ralf’s Vietnamese dictionary 0.1.1

Monday, May 24th, 2010

Let’s improve Ralf's Vietnamese dictionary:

1. Convert eSpeak phonemes into IPA phonemes:

$ cat '/media/5f6432a3-9a68-45ee-b4b7-11f3b009825a/home/am3msi/Documents/200911/vietnamese/dictionaries/vietnamese-dictionary.xml.bz2' | bunzip2 -k | saxonb-xslt -ext:on -s:- -xsl:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/espeak2ipa.xsl'

2. Download Ralf's Vietnamese dictionary, and import it into simon.

Vietnamese Hanoi dictionary (HTK)

Friday, December 11th, 2009

I just imported a Vietnamese Northern (Hanoi) dialect dictionary. You can download the dictionary. I imported it as HTK lexicon. This is the result:

vietnamese-htk

It looks fine. Here is a small excerpt:

mà [mà] m aa2
má [má] m aa3
mả [mả] m aa4
mã [mã] m aa5
mạ [mạ] m aa6
ma [ma] m aa7

If you speak the Vietnamese language, you should get the concept. The different a-vowels are different phonemes (aa2, aa3, aa4, aa5, aa6, aa7). This approach should be OK.

It would be nice if a native speaker would try to record a few Vietnamese words with simon:

record-vietnamese

It would be interesting to know whether it works since Vietnamese is completely different from English. I recommend that you try to record 10 different Vietnamese words with simon (each word 8 times).

Ralf’s Vietnamese dictionary

Sunday, November 15th, 2009

You can import Ralf's Vietnamese dictionary (version 0.1; GPLv3) into simon. The dictionary contains about 6.000 words; training is not possible. The phoneme elements contain eSpeak phonemes (not IPA phonemes).