Some words about the creation of Ralf’s Italian speech model.
1. I took a look at the Italian frequency list. It is licensed under the LGPL – very good.
How I create
Ralf's Italian dictionary version 0.1.2:
1. Make some adjustments to
Ralf's eSpeak2IPA style-sheet).
2. Transform the eSpeak phonemes (
Ralf's Italian dictionary version 0.1.1 contains espeak phonemes) into
IPA phonemes via the Ubuntu terminal:
$ cat '/media/5f6432a3-9a68-45ee-b4b7-11f3b009825a/home/am3msi/Documents/200911/italian/it_IT/italian-dictionary.xml.bz2' | bunzip2 -k | saxonb-xslt -ext:on -s:- -xsl:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/espeak2ipa.xsl'
Some explanations: The
cat command outputs the content of
Ralf's Italian dictionary 0.1.1 in compressed form. The special character
"|" causes the output of the
cat command to be used as input for the
bunzip command. The output of the
bunzip command is then used as input for
4. A native speaker could improve
Ralf's Italian dictionary.
1. I got an Italian spelling dictonary.
unmunch command produced more than 20 million Italian words. Because simon is not intended to handle very large lexicons, I decided to use the style-sheet
create-graphemes-italian.xsl instead. This style-sheet removes the prefix/suffix information from the spelling dictionary
it_IT.dic. The result was an SSML file with about 90.000 Italian words.
3. I generated from the SSML file the corresponding phonemes:
$ espeak -f italian-audio-o -m -v it -q -x --phonout="italian-espeak"
4. Then I combined the
grapheme elements with the
5. The last step was the conversion from eSpeak phonemes to IPA phonemes with the style-sheet
espeak2perfectipa-italian.xsl. Here are some of the Italian specific conversions that are contained in the style-sheet:
replace($sierra, 'dZ:', 'ddʒ')
replace($sierra, 'ts:', 'ddz')
replace($sierra, 't:', 'tt')
replace($sierra, 'd:', 'dd')
replace($sierra, 's:', 'ss')
replace($sierra, 'b:', 'bb')
replace($sierra, 'k:', 'kk')
I tried to follow the IPA for Italian. To make the dictionary work with simon (so that training produces reasonable results), the simon import process has to be adjusted. Effective training is currently not possible.
Now you know that an Italian pronunciation dictionary exists that you can import into simon.
Bad Behavior has blocked 427 access attempts in the last 7 days.