Posts Tagged ‘Tagalog’

Ralf’s Tagalog speech model

Thursday, May 17th, 2012

Some words about the creation of this speech model.

1. Download Ralf’s Tagalog dictionary.
2. Create a Talalog scenario. Delete the old shadow vocabulary.
3. Train ten Tagalog words:

alpabeto, balakang, dobleng, kababayan, mababata, makakita, naunang, palaso, payagang, tumatanda

4. Add grammar Unknown. Add dictation plugin.
5. Actions > Synchronize. Actions > Activate. The recognition result is bad:

balakang balakang mababata balakang balakang payagang balakang mababata

6. Get Ralf’s Tagalog speech model.

Ralf’s Tagalog dictionary

Thursday, May 6th, 2010

Let me explain how I create Ralf's Tagalog dictionary:

1. Get spelling dictionary. License is GPL.
2. Language code is tl.
3. The encoding of tl_PH.dic is ISO-8859-1. I don’t need tl_PH.aff because it contains no word generation information. I should convert tl_PH.dic to UTF-8:

ubuntu@ubuntu-desktop:~/Documents/201005/tagalog-dictionary$ iconv -f ISO8859-1 -t UTF-8 < tl_PH.dic > tagalog

4. I have to prepare tagalog:
a. Search for "\n" and replace with "</grapheme></lexeme>\n<lexeme><grapheme>".
b. Add <lexicon> at the beginning of the file; </lexicon> at the end of the file.

5. For the grapheme-to-phoneme conversion I can use this table.

6. Generate Tagalog PLS dictionary:

ubuntu@ubuntu-desktop:~/Documents/201005/tagalog-dictionary$ saxonb-xslt -s:tagalog -xsl:improve-tagalog-dictionary.xsl -o:tagalog-dictionary.xml

7. Download Ralf's Tagalog dictionary, and import it into simon.