Posts Tagged ‘node19’

Ralf’s Latin speech model 0.1.2

Sunday, September 12th, 2010

Download Ralf's Latin speech model version 0.1.2. It contains more than 2000 Latin words (from sections xaa and xaf) in German pronunciation.

Latin speech model ‘xaa’

Sunday, September 12th, 2010

Take a look at Ralf's Latin IPA FLAC files (section: xaa). I want to build a speech model that recognizes these words.

1. I have to prepare a prompts file:

$ cat /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/lexicon-xaa.xml | saxonb-xslt -ext:on -s:- -xsl:/home/ubuntu/Documents/201008/latin-0.1.2/lexiconxaa2prompts.xsl

2. Remove old files:

$ rm -i -r /home/ubuntu/.kde/share/apps/simon
rm: descend into directory `/home/ubuntu/.kde/share/apps/simon'? y
rm: remove directory `/home/ubuntu/.kde/share/apps/simon/model'? y
rm: remove regular file `/home/ubuntu/.kde/share/apps/simon/shadowvocabulary.xml'? y
rm: remove regular file `/home/ubuntu/.kde/share/apps/simon/protocol.log'? y
rm: remove directory `/home/ubuntu/.kde/share/apps/simon'? y

3. Start simon 0.3.0. Manage scenarios > New. Name of the new scenario: latin-xaa

4. Vocabulary > Import dictionary > Target: Active dictionary > PLS lexicon > File: /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/lexicon-xaa.xml

5. Grammar > Add sentence > Add structure: Unknown

6. Training > Import training data > Import Prompts > Prompts: /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/prompts-xaa

It is necessary to convert the FLAC files into WAV format via Ubuntu terminal:

mkdir /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/wav-xaa
cd /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/flac-xaa && \
for f in *.flac; do sox "$f" "/home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/wav-xaa/${f%.flac}.wav"; done

Import Prompts > Base directory: /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/wav-xaa > Importing 899 files

7. Add dictation plug-in.

8. Start ksimond. Connect with ksimond. Press Connect. Press Synchronize. Press Activate. The following error message appears:

Could not start recognition because the system reports that the recognition is not ready.

Please check if you have defined a vocabulary, an appropriate grammar and recorded a few trainings samples.

The system will then, upon synchronization, generate the model which will be used for the recognition.

What did I forget? I have an active vocabulary (each word has a recognition rate of 1). I have defined a grammar (Unknown). I have imported training data. And I have the dictation plugin. I check Settings > Configure simon > Model settings > User generated model. So everything should be clear.

9. I don’t know what went wrong. Any hint?

[see comments]

12. Download Latin speech model 'xaa'.

Ralf’s Latin speech model 0.1.1

Saturday, August 21st, 2010

Take a look at Ralf's Latin dictionary - German pronunciation (version 0.1.1; 2010-08-21), and download Ralf's Latin speech model 0.1.1. Of course, this is an early approach. The speech model contains about 800 different Latin words.

Removing words from Latin dictionary

Tuesday, April 13th, 2010

Recently, I imported Ralf's Latin dictionary with 1.7 million words. The import took about 2 minutes. I think that this is not acceptable. Probably, it would be the best decision if I reduced the size of this dictionary.

How can I reduce the dictionary size? I could sort out <lexeme> elements whose <grapheme> elements end with que, ve, or ne. This should reduce the size of Ralf's Latin dictionary significantly. I will have to write the appropriate lines into the .xsl style-sheet. These are the lines in improve-latin-dictionary.xsl that remove these specific <lexeme> elements:

<xsl:choose>
<xsl:when test="ends-with(grapheme, 'que')"/>
<xsl:when test="ends-with(grapheme, 've')"/>
<xsl:when test="ends-with(grapheme, 'ne')"/>
<xsl:otherwise>
<xsl:text>
</xsl:text>
<xsl:element name="lexeme">
<xsl:call-template name="create-latin-role-attribute"/>
<xsl:call-template name="create-latin-grapheme-element"/>
<xsl:call-template name="create-latin-phoneme-element"/><xsl:text>
</xsl:text>
</xsl:element>
</xsl:otherwise>
</xsl:choose>

If you take a look into improve-latin-dictionary.xsl you can get an impression how PLS dictionary development is done. If you are a German Latin teacher, you can import Ralf's Latin dictionary (German pronunciation) into simon. If you pronounce the Latin words as if they were German words, you should be able to get some recognition results with simon.

Get the new version of Ralf's Latin dictionary – German pronunciation (version 0.1.2; April 13, 2010). I reduced the dictionary size significantly so that the performance should be acceptable. The dictionary contains now about 470.000 Latin words.

Import 1.7 million Latin pronunciations

Thursday, November 5th, 2009

You can import Ralf's Latin dictionary (version 0.1.1) into simon. It contains about 1.7 million Latin words. Some information about how I created the dictionary:

1. The Latin words were extracted from a Latin OpenOffice.org dictionary with the command:

$ unmunch la.dic la.aff > latin-wordlist

2. The phonemes were originally generated with eSpeak (German voice) using the command:

$ espeak -f latin-ssml -m -v de -q -x --phonout="espeak-latin"

This means that no Latin specific pronunciation rules were applied. The pronunciation is as if the Latin words were German.

3. I used this style-sheet to transform the eSpeak phonemes into IPA phonemes.

License of Ralf's Latin dictionary is GPLv3. On my computer, the import of the dictionary took about 15 minutes because of its size. So you have to be really patient when you import the dictionary.

It should be possible to train a few Latin words with simon. It is necessary that you pronounce the words as if they were normal German words (German accent; no Latin specific vowel length).