Posts Tagged ‘latin’

Latin speech model ‘xcs’

Friday, March 25th, 2011

You can import the Latin speech model ‘xcs’ into simon. The words are from section xcs. They can be found at Voxforge, too. The following words were recognized (50% correctly):

coemamus comamini comatum combinabimur combinaturos combinaturas combinetur comedendos comedentibus comederer comedimur comedunto commacularet commacularit commaculasset commaculatas commeabant commeabimur commeamini commeandis commereatur

Latin speech model ‘xcn’

Friday, March 25th, 2011

You can import the Latin speech model ‘xcn’ into simon. It contains the words from section xcn which are available at Voxforge, too. The following words were recognized with a recognition rate of about 40%:

caestui caeptorum caeptos ceperitis certata certatis certatote certaturis certaveratis certavimus certem cessabat cessabunt cessanti cessantis cessaremur cessaris cessatura cessaturum cessaveris cessero cessisti cessuro crepaberis crepabuntur crepaturas crepida crepidinum crepitarimus crepitaro

Latin speech model ‘xci’

Thursday, March 24th, 2011

You can now import the Latin speech model ‘xci’ into simon. The words are from section xci. You can get the corresponding FLAC files from Voxforge. About 50% of the following words were recognized correctly:

hiscite histricae historiis histrionis homicidiis hostilium Homeros hiscendarum honestatibus honorabunt honorans honorarimus honoratur honestissimorum honoraveram horoscopanda horoscoparetur hortatuum Hirtii hostia hispidae hostiam hosticorum humanitas humanitate humanitatem humanitates humanitatis humanitatum

Latin speech model ‘xcd’

Thursday, March 24th, 2011

You can import the Latin speech model ‘xcd’ into simon. You can find the words (section xcd) of this speech model at Voxforge.

Latin speech model ‘xbt’

Tuesday, March 22nd, 2011

You can import the Latin speech model ‘xbt’ into simon. It contains words from section xbt. The words can be found at Voxforge, too.

Latin speech model ‘xbe’

Monday, March 21st, 2011

How I create the Latin speech model ‘xbe’:

1. Make directory via Ubuntu terminal:

mkdir /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/latin-0.1.3/split/xbe-folder/latin-speech-model-xbe

2. Copy hmmdefs:

cp /tmp/kde-ubuntu/simond/default/compile/hmm24/hmmdefs /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/latin-0.1.3/split/xbe-folder/latin-speech-model-xbe/hmmdefs-xbe

3. Copy macros:

cp /tmp/kde-ubuntu/simond/default/compile/hmm24/macros /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/latin-0.1.3/split/xbe-folder/latin-speech-model-xbe/macros-xbe

4. Copy stats:

cp /tmp/kde-ubuntu/simond/default/compile/stats /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/latin-0.1.3/split/xbe-folder/latin-speech-model-xbe/stats-xbe

5. Copy tiedlist:

cp /tmp/kde-ubuntu/simond/default/compile/tiedlist /media/104d991d-2062-40d7-89f6-ddde3cb5b781/home/ubuntu/Documents/2011-i/latin-0.1.3/split/xbe-folder/latin-speech-model-xbe/tiedlist-xbe

6. Get GPL license text:

wget http://script.blau.in/etc/GPL_License

7. You can import the speech model into simon (Manage scenarios and Static model). The source audio files are taken from section xbe, and can be found at Voxforge, too.

Latin speech model ‘xaz’

Friday, February 25th, 2011

You can import the Latin speech model ‘xaz’ into simon. The words are taken from section ‘xaz’. They can be found at Voxforge, too.

Latin speech model ‘xau’

Friday, February 25th, 2011

You can import the Latin speech model ‘xau’ into simon. It contains words from section ‘xau’. You can find these words at Voxforge, too.

Latin speech model ‘xap’

Friday, February 25th, 2011

You can import the Latin speech model ‘xap’ into simon. It contains words from section ‘xap’. You can get the audio files from Voxforge, too.

Latin speech model ‘xak’

Thursday, February 24th, 2011

You can import the Latin speech model ‘xak’ into simon. The words that are used in this speech model can be found in section xak.

Latin speech model ‘xaf’

Tuesday, February 15th, 2011

A few months ago, I published the Latin speech model ‘xaa’. You can find the corresponding audio files at VoxForge, too.

Please, download the Latin speech model 'xaf' that contains words from section xaf.

Do you know how to import the Latin speech model 'xaf' into simon? No? Then read this article, and take especially a look at this screen shot:

Short explanation: the Latin speech model 'xaf' contains the files hmmdefs-xaf, tiedlist-xaf, macros-xaf, and stats-xaf. You have to set the correct paths to these files.

manage-scenariosThen, you have to use the manage scenario function.

I hope that there is someone out there who tries to import the Latin speech model 'xaf' into simon. My recognition results were poor. But never mind, these are just the first steps.

Ralf’s Latin speech model 0.1.2

Sunday, September 12th, 2010

Download Ralf's Latin speech model version 0.1.2. It contains more than 2000 Latin words (from sections xaa and xaf) in German pronunciation.

Latin speech model ‘xaa’

Sunday, September 12th, 2010

Take a look at Ralf's Latin IPA FLAC files (section: xaa). I want to build a speech model that recognizes these words.

1. I have to prepare a prompts file:

$ cat /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/lexicon-xaa.xml | saxonb-xslt -ext:on -s:- -xsl:/home/ubuntu/Documents/201008/latin-0.1.2/lexiconxaa2prompts.xsl

2. Remove old files:

$ rm -i -r /home/ubuntu/.kde/share/apps/simon
rm: descend into directory `/home/ubuntu/.kde/share/apps/simon'? y
rm: remove directory `/home/ubuntu/.kde/share/apps/simon/model'? y
rm: remove regular file `/home/ubuntu/.kde/share/apps/simon/shadowvocabulary.xml'? y
rm: remove regular file `/home/ubuntu/.kde/share/apps/simon/protocol.log'? y
rm: remove directory `/home/ubuntu/.kde/share/apps/simon'? y

3. Start simon 0.3.0. Manage scenarios > New. Name of the new scenario: latin-xaa

4. Vocabulary > Import dictionary > Target: Active dictionary > PLS lexicon > File: /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/lexicon-xaa.xml

5. Grammar > Add sentence > Add structure: Unknown

6. Training > Import training data > Import Prompts > Prompts: /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/prompts-xaa

It is necessary to convert the FLAC files into WAV format via Ubuntu terminal:

mkdir /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/wav-xaa
cd /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/flac-xaa && \
for f in *.flac; do sox "$f" "/home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/wav-xaa/${f%.flac}.wav"; done

Import Prompts > Base directory: /home/ubuntu/Documents/201008/latin-0.1.2/split/xaa-folder/wav-xaa > Importing 899 files

7. Add dictation plug-in.

8. Start ksimond. Connect with ksimond. Press Connect. Press Synchronize. Press Activate. The following error message appears:

Could not start recognition because the system reports that the recognition is not ready.

Please check if you have defined a vocabulary, an appropriate grammar and recorded a few trainings samples.

The system will then, upon synchronization, generate the model which will be used for the recognition.

What did I forget? I have an active vocabulary (each word has a recognition rate of 1). I have defined a grammar (Unknown). I have imported training data. And I have the dictation plugin. I check Settings > Configure simon > Model settings > User generated model. So everything should be clear.

9. I don’t know what went wrong. Any hint?

[see comments]

12. Download Latin speech model 'xaa'.

Ralf’s Latin speech model 0.1.1

Saturday, August 21st, 2010

Take a look at Ralf's Latin dictionary - German pronunciation (version 0.1.1; 2010-08-21), and download Ralf's Latin speech model 0.1.1. Of course, this is an early approach. The speech model contains about 800 different Latin words.

Removing words from Latin dictionary

Tuesday, April 13th, 2010

Recently, I imported Ralf's Latin dictionary with 1.7 million words. The import took about 2 minutes. I think that this is not acceptable. Probably, it would be the best decision if I reduced the size of this dictionary.

How can I reduce the dictionary size? I could sort out <lexeme> elements whose <grapheme> elements end with que, ve, or ne. This should reduce the size of Ralf's Latin dictionary significantly. I will have to write the appropriate lines into the .xsl style-sheet. These are the lines in improve-latin-dictionary.xsl that remove these specific <lexeme> elements:

<xsl:choose>
<xsl:when test="ends-with(grapheme, 'que')"/>
<xsl:when test="ends-with(grapheme, 've')"/>
<xsl:when test="ends-with(grapheme, 'ne')"/>
<xsl:otherwise>
<xsl:text>
</xsl:text>
<xsl:element name="lexeme">
<xsl:call-template name="create-latin-role-attribute"/>
<xsl:call-template name="create-latin-grapheme-element"/>
<xsl:call-template name="create-latin-phoneme-element"/><xsl:text>
</xsl:text>
</xsl:element>
</xsl:otherwise>
</xsl:choose>

If you take a look into improve-latin-dictionary.xsl you can get an impression how PLS dictionary development is done. If you are a German Latin teacher, you can import Ralf's Latin dictionary (German pronunciation) into simon. If you pronounce the Latin words as if they were German words, you should be able to get some recognition results with simon.

Get the new version of Ralf's Latin dictionary – German pronunciation (version 0.1.2; April 13, 2010). I reduced the dictionary size significantly so that the performance should be acceptable. The dictionary contains now about 470.000 Latin words.

Import 1.7 million Latin pronunciations

Thursday, November 5th, 2009

You can import Ralf's Latin dictionary (version 0.1.1) into simon. It contains about 1.7 million Latin words. Some information about how I created the dictionary:

1. The Latin words were extracted from a Latin OpenOffice.org dictionary with the command:

$ unmunch la.dic la.aff > latin-wordlist

2. The phonemes were originally generated with eSpeak (German voice) using the command:

$ espeak -f latin-ssml -m -v de -q -x --phonout="espeak-latin"

This means that no Latin specific pronunciation rules were applied. The pronunciation is as if the Latin words were German.

3. I used this style-sheet to transform the eSpeak phonemes into IPA phonemes.

License of Ralf's Latin dictionary is GPLv3. On my computer, the import of the dictionary took about 15 minutes because of its size. So you have to be really patient when you import the dictionary.

It should be possible to train a few Latin words with simon. It is necessary that you pronounce the words as if they were normal German words (German accent; no Latin specific vowel length).