Posts Tagged ‘de’

Tutorial: import German dictionary

Thursday, June 17th, 2010

This article explains how to import Ralf's German dictionary into simon.

german-011. Ubuntu > Applications > Universal Access > simon.

german-022. Press the Vocabulary button.

german-033. Press Import Dictionary.

german-044. The target is the Shadow Dictionary. Press the Next button.

german-055. Select PLS Lexicon. Press the Next button.

german-066. Select the path to Ralf's German dictionary.

german-077. Download Ralf's German dictionary, and specify the path. Press the Next button.

german-088. You have to wait a few seconds until simon is finished with the import of the PLS dictionary.

german-099. Ralf's German dictionary has been imported successfully. Press the Finish button.

german-1010. Where is the list with the imported words? You don’t see any words because the Active Vocabulary tab is opened. Press the Shadow Vocabulary tab.

Note for the simon developers: If someone imports a dictionary as Shadow Dictionary (step 4 in this tutorial), simon should switch automatically to the Shadow Vocabulary tab after the import has been finished.

german-1111. You can now see the Shadow Vocabulary. The first column contains the word. The second column displays the corresponding SAMPA transcription. And the third column contains grammar information (e.g. Zahlwort, Substantiv, Adjektiv, Verb).

Conclusion: Now you know how you can import Ralf's German dictionary into simon.

Ralf’s German dictionary 0.1.9.7

Thursday, June 17th, 2010

This article explains how I am preparing Ralf's German dictionary 0.1.9.7. At moment, I am focusing on the creation of new <phoneme> elements:

1. Add the following rule to improve-german.xsl:

<xsl:when test="ends-with(lower-case(../grapheme), 'ier') and
ends-with(., 'iːʀ') and
not(ends-with(../grapheme, 'eier'))"><xsl:value-of select="replace($sierra, 'iːʀ', 'iːɐ̯')"/></xsl:when>

2. Add this rule:

<xsl:when test="ends-with(../grapheme, 'gen') and
not(ends-with(../grapheme, 'ngen'))"><xsl:value-of select="replace($sierra, 'gən', 'gŋ̩')"/></xsl:when>

3. Invoke the following instruction via Ubuntu terminal because I want to test whether improve-german.xsl produces the desired results:

ubuntu@ubuntu-desktop:~$ saxonb-xslt -ext:on -s:'/home/ubuntu/Documents/201006/german-0.1.9.5/german-0.1.9.6.xml' -xsl:'/home/ubuntu/Documents/201005/german-0.1.9.4/improve-german.xsl' -o:'/home/ubuntu/Documents/201006/german-0.1.9.5/german-0.1.9.7.xml'

4. Add the following rule:

<xsl:when test="contains(../grapheme, 'gens') and
not(ends-with(../grapheme, 'ngens'))"><xsl:value-of select="replace($sierra, 'gəns', 'gŋ̩s')"/></xsl:when>

What does this rule create? Here is an example:
Source dictionary:

<lexeme role="Substantiv">
<grapheme>Volkswagens</grapheme>
<phoneme>fɔlksvagəns</phoneme>
</lexeme>

Object dictionary:

<lexeme role="Substantiv">
<grapheme>Volkswagens</grapheme>
<phoneme>fɔlksvagəns</phoneme>
<phoneme>fɔlksvagŋ̩s</phoneme>
</lexeme>

You can see that I am using XSLT to produce additional <phoneme> elements.

5. Add this rule:

<xsl:when test="ends-with(../grapheme, 'bens')"><xsl:value-of select="replace($sierra, 'bəns', 'bm̩s')"/></xsl:when>

Example: Source dictionary:

<lexeme role="Substantiv">
<grapheme>Schneetreibens</grapheme>
<phoneme>ʃneːtʀaɪ̯bəns</phoneme>
</lexeme>

Object dictionary:

<lexeme role="Substantiv">
<grapheme>Schneetreibens</grapheme>
<phoneme>ʃneːtʀaɪ̯bəns</phoneme>
<phoneme>ʃneːtʀaɪ̯bm̩s</phoneme>
</lexeme>

6. Add the following rule:

<xsl:when test="ends-with(../grapheme, 'ben')"><xsl:value-of select="replace($sierra, 'bən', 'bm̩')"/></xsl:when>

Example from the source dictionary:

<lexeme role="Verb">
<grapheme>ausgeben</grapheme>
<phoneme>ʔaʊ̯sgeːbən</phoneme>
</lexeme>

Target dictionary:

<lexeme role="Verb">
<grapheme>ausgeben</grapheme>
<phoneme>ʔaʊ̯sgeːbən</phoneme>
<phoneme>ʔaʊ̯sgeːbm̩</phoneme>
</lexeme>

With this rule, I added about 1400 <phoneme> elements. It would be too much work to do this manually. Thanks to saxonb-xslt I can work efficiently and precisely.

7. If you have suggestions for improvements of Ralf's German dictionary, please tell me. At the moment, the dictionary contains 384067 <lexeme> elements. I don’t want to add more words to the dictionary at the moment. I want to improve the phoneme quality. And there are a lot of <grapheme> elements that have more than one possible pronunciation. This is my current focus to add a <phoneme> element where it seems to be appropriate. If you are missing something, please tell me.

(more…)

Ralf’s German dictionary 0.1.9.6

Friday, June 11th, 2010

Via Ubuntu terminal I created Ralf's German dictionary version 0.1.9.6:

ubuntu@ubuntu-desktop:~$ saxonb-xslt -ext:on -s:'/home/ubuntu/Documents/201006/german-0.1.9.5/german-0.1.9.5.xml' -xsl:'/home/ubuntu/Documents/201005/german-0.1.9.4/improve-german.xsl' -o:'/home/ubuntu/Documents/201006/german-0.1.9.5/german-0.1.9.6.xml'

A lot of terminal information (mostly Verb; Adjektiv) has been added. About 90% of all <lexeme> elements now contain terminal information.

Some words are marked wrongly: E.g. the word Übungsheften is marked as Verb. But it should be marked as a noun (Substantiv). I will fix that later.

Ralf’s German dictionary 0.1.9.5

Friday, June 4th, 2010

Ralf's German dictionary version 0.1.9.5 includes the following replacement rules:

replace($sierra, 't͡suːɔʀdn', 't͡suːʔɔʀdn')
replace($sierra, 'vɐbɛnd', 'fɐbɛnd')
replace($sierra, 'ɔʏ', 'ɔɪ̯')
replace($sierra, 'œstɛʀʀaɪ̯ç', 'øːstɐʀaɪ̯ç')
replace($sierra, 'stʊnd', 'ʃtʊnd')
replace($sierra, 'o:', 'oː')
replace($sierra, 'ts', 't͡s')
replace($sierra, 'gyltɪg', 'gʏltɪg')
replace($sierra, 'shoːv', 'shoʊ̯')
replace($sierra, 'S', 'ʃ')

The phoneme [ɔʏ] has been replaced by the phoneme [ɔɪ̯]. I wanted to have more consistency within the dictionary. I think that the Wikipedia uses [ɔ͡ʏ] while the Wiktionary uses [ɔɪ̯]:

Heu /[hɔɪ̯]/, Läufer /[ˈlɔɪ̯fɐ]/, neu /[nɔɪ̯]/

Both transcriptions would be correct. But I want to use only [ɔɪ̯].

Ralf’s German dictionary 0.1.9.4

Friday, June 4th, 2010

How I create Ralf's German dictionary version 0.1.9.4:

1. Editing improve-german.xsl:
If test="ends-with(grapheme, 'gehst'), then <xsl:text>Verb Singular Gegenwart Indikativ</xsl:text>.

2. Doing several replacements with Geany:
replaced 48 occurrences of “ʃadən” with “ʃaːdən”.
replaced 110 occurrences of “vɐlʊst” with “fɐlʊst”.
replaced 24 occurrences of “veːʀɛndəʀʊŋ” with “fɛʀɛndəʀʊŋ”.
replaced 290 occurrences of “bətʀiːbs” with “bətʀiːps”.
replaced 62 occurrences of “tsʊstɛl” with “t͡sʊʃtɛl”.

3. Edit improve-german.xsl again:
When contains(lower-case(grapheme), 'quer'), then replace($sierra, 'kvɛʀ', 'kveːʀ').
replace($sierra, 'stʀaɪ̯t', 'ʃtʀaɪ̯t')
replace($sierra, 'ʔœl', 'ʔøːl')

4. Create version 0.1.9.4 via Ubuntu terminal:

ubuntu@ubuntu-desktop:~$ saxonb-xslt -ext:on -s:'/home/ubuntu/Documents/201005/german-0.1.9.4/german-0.1.9.3.xml' -xsl:'/home/ubuntu/Documents/201005/german-0.1.9.4/improve-german.xsl' -o:'/home/ubuntu/Documents/201005/german-0.1.9.4/german-0.1.9.4.xml'

5. Version 0.1.9.4 is slightly better than the previous version. I will continue with the improvement of this dictionary.

German Tastatur scenario: Rakete

Tuesday, May 18th, 2010

I took a look into the German Tastatur scenario:

<word>
<name>Rakete</name>
<pronunciation>gls R a k @ t @</pronunciation>
<terminal>Taste</terminal>
</word>

At the moment, I don’t care about the glottal stop. I care about the wrong transcription: The first e-vowel shouldn’t be a short e. It should be a long e-vowel. Take a look into Ralf’s German dictionary:

<lexeme role="Substantiv">
<grapheme>Rakete</grapheme>
<phoneme>ʔʀakətə</phoneme>
</lexeme>

The correct IPA transcription is <phoneme>ʔʀaktə</phoneme>. I can speak both, and always the letter "r" is typed by simon.

You see: it is working, but there is still a lot of work to do.