Posts Tagged ‘Romanian’

Ralf’s Romanian speech model

Thursday, May 17th, 2012

Some words about the creation of this speech model:

1. Get Ralf’s Romanian dictionary 0.1.1.
2. Create a Simon scenario with the name “Romanian”.
3. Delete the shadow vocabulary.
4. Import Ralf’s Romanian dictionary as shadow dictionary (PLS format).
5. Add ten words to training. Press the Train selected words button. Simon asks:

Your vocabulary does not define all words used in this text. These words are missing:
multicoloara, delapidată, delebil, delectat, diferim, dificilă, diftong, slab, văcar, împânzit

Do you want to add them now?

Press the Yes button.

6. Add grammar “Unknown”. Add dictation plugin.
7. Actions > Synchronize. Actions > Activate. Dictate a few words:

delectat delectat delectat dificil diftong vcar delectat vcar delectat

Not all Romanian letters appear. Some are being omitted.

8. I have to switch the keyboard language. This can be done in Linux Mint (Gnome Classic layout): Linux Mint > System Settings > Keyboard Layout.

9. Unfortunately, it doesn’t help to switch the keyboard layout. The dictation result is still the same.

10. Export the Romanian scenario. Export the Romanian base model.
11. Download Ralf’s Romanian speech model.

Ralf’s Romanian dictionary 0.1.1

Sunday, May 16th, 2010

How I create Ralf's Romanian dictionary version 0.1.1:

1. Version 0.1 contains espeak phonemes. They should be converted into IPA phonemes.

2. Take a look at Romanian letters and pronunciation.

3. Off topic: The Romanian Revolution of 1989 was violent and forcefully. I hope that the next revolution will be peacefully. I support freedom and justice, especially the freedom of speech. Because the German legal system doesn’t guarantee the freedom of speech, I am developing PLS dictionaries for a lot of languages. It would be nice if a native speaker from Romania would continue with the development of the Romanian PLS dictionary. Let’s defend our freedom of speech with open source ASR software. It would be great if my PLS dictionaries would become a part of the upcoming revolution.

4. I can’t find the word Ceauşescu in Ralf's Romanian dictionary. I don’t know why this <grapheme> element is missing. Sorry. At least his forename is in my PLS dictionary:

<lexeme>
<grapheme>Nicolae</grapheme>
<phoneme>n,ikol'ae</phoneme>
</lexeme>

5. Edit the section matches(/lexicon/@xml:lang, 'ro'):

replace($espeak2ipa, 'aU', 'aʊ̯')

This diphtong is available in Ralf's German dictionary, too.

6. Take a look into Romanian phonology – diphthongs. Adjusting the replacement rules:

replace($espeak2ipa, 'ea', 'e̯a')
replace($espeak2ipa, 'eI', 'ej')
replace($espeak2ipa, 'eo', 'e̯o')
replace($espeak2ipa, 'eU', 'e̯u')
replace($espeak2ipa, 'iI', 'ij')
replace($espeak2ipa, 'iU', 'ju')
replace($espeak2ipa, 'Oa', 'o̯a')
replace($espeak2ipa, 'uI', 'uj')
replace($espeak2ipa, 'yU', 'ɨw')
replace($espeak2ipa, 'yI', 'ɨj')
replace($espeak2ipa, 'w2', 'wə')

Of course, these are just guesses. At least, you should be able to understand the concept. I use XPath to transform the eSpeak phonemes into IPA phonemes.

7. Obviously, there are a lot of diphtongs and triphthongs available in the Romanian language. I have never heard of triphthongs before.

8. Generate PLS dictionary:

$ saxonb-xslt -s:'/media/5f6432a3-9a68-45ee-b4b7-11f3b009825a/home/am3msi/Documents/200911/romanian/romanian-dictionary.xml' -xsl:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/ralfs-ipa-stylesheet.xsl' -o:'/home/ubuntu/Documents/201005/romanian-0.1.1/romanian-dictionary.xml'

9. Download Ralf's Romanian dictionary 0.1.1, and import it into simon. Take a look at the shadow dictionary:

romanianThe left column offers 155288 Romanian words. The right column contains the corresponding SAMPA phonemes.

10. I hope that a native speaker will improve this PLS dictionary.

Import 150.000 Romanian words

Thursday, November 12th, 2009

You can import Ralf's Romanian dictionary (version 0.1; GPLv3) into simon. Training with this dictionary is not possible.

The <phoneme> elements contain eSpeak phonemes (not IPA phonemes) – alphabet="espeak".

Some words about the creation of this dictionary: After getting a spelling dictionary, I generated from the SSML document the phonemes with eSpeak.