Posts Tagged ‘Macedonian’

Ralf’s Macedonian speech model

Thursday, May 17th, 2012

Some words about the creation of this speech model:

1. Download Ralf’s Macedonian dictionary.
2. Create a Simon scenario with the name Macedonian.
3. Import Ralf’s Macedonian dictionary as shadow dictionary.
4. I want to train ten words. Simon asks:

Your vocabulary does not define all words used in this text. These words are missing:
босите, босово, ботаничко, ботарева, мусадин, негата, негативата, предните, рипало, рипнува

Do you want to add them now?

Press the Yes button.

5. Define as grammar “Unknown”.
6. Add the Dictation plugin.
7. Press Synchronize. Press Activate. Simon just recognizes the space bar (because I have configured the dictiation plugin to add a space bar after each recorded word). Perhaps I should change the keyboard layout. I just tried that. Unfortunately, it doesn’t solve my problem. But I know that Simon is recognizing the words. There is just a problem with the output.

8. Download Ralf’s Macedonian speech model.

Ralf’s Macedonian dictionary

Saturday, April 24th, 2010

Let me explain how I create Ralf's Macedonian dictionary:

1. Get spelling dictionary, license is GPLv2.

2. Convert mk_MK.dic from cp1251 to UTF-8 via Ubuntu terminal:

am3msi@am3msi-desktop:~/Documents/201004/macedonian-dictionary$ iconv -f cp1251 -t UTF-8 macedonian-utf8.dic

3. Convert mk_MK.aff

am3msi@am3msi-desktop:~/Documents/201004/macedonian-dictionary$ iconv -f cp1251 -t UTF-8 macedonian-utf8.aff

4. Changing the first line of the file macedonian-utf8.aff from SET microsoft-cp1251 to SET UTF-8 with gedit.

5. The unmunch command wouldn’t bring additional Macedonian words.

6. Adding <lexicon> tags to the file macedonian-utf8.dic (<lexicon> at the beginning of the file; </lexicon> at the end of the file).

7. Generating .xml file:

am3msi@am3msi-desktop:~/Documents/201004/macedonian-dictionary$ saxonb-xslt -ext:on -s:macedonian-utf8.dic -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:macedonian.xml

8. Thinking about the style-sheet improve-macedonian-dictionary.xsl. This style-sheet should contain the language tag mk. The grapheme-to-phoneme conversion should follow this table. Great:

“Macedonian orthography is consistent and phonemic in practice, an approximation of the principle of one grapheme per phoneme. A principle represented by Adelung’s saying, “write as you speak and read as it is written” („пишувај како што зборуваш и читај како што е напишано“).”

This means that Ralf's Macedonian dictionary shouldn’t be too bad.

9. Generating <phoneme> elements via Ubuntu terminal:

am3msi@am3msi-desktop:~/Documents/201004/macedonian-dictionary$ saxonb-xslt -ext:on -s:macedonian.xml -xsl:'improve-macedonian-dictionary.xsl' -o:macedonian-dictionary.xml

10. Download Ralf's Macedonian dictionary, and import it into simon. Take a look at the Shadow Vocabulary:

macedonian

Word column: Macedonian words
Pronunciation column: Corresponding SAMPA pronunciation

11. It should be possible to train the word геометриска:

train-macedonian-word

At the moment, I am not able to record this word because my sound configuration has to be adjusted. But the important thing is: You should be able to get some initial recognition results with Ralf's Macedonian dictionary. It would be nice if somebody used Ralf's Macedonian dictionary for training of some Macedonian words. I would like to know whether the Macedonian/Cyrillic alphabet causes issues with HTK or not. My guess is that it works, but I haven’t tested it.