How I create Ralf's Swiss German dictionary:
1. Get spelling dictionary. License is GPL.
2. Language code is de-CH (not mentioned in the Wikipedia, but you know the concept: de-DE for Ralf's German dictionary; de-AT for Ralf's Austrian German dictionary).
3. Ralf's Swiss German dictionary should become a sister project of Ralf's German dictionary. I hope that someone from Switzerland is willing to improve Ralf's Swiss German dictionary.
4. The encoding of de_CH_frami.dic and de_CH_frami.aff is ISO-8859-1. I will have to convert both files into UTF-8.
5. Convert de_CH_frami.dic into UTF-8:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ iconv -f ISO8859-1 -t UTF-8 < de_CH_frami.dic > swiss.dic
6. Convert de_CH_frami.aff into UTF-8:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ iconv -f ISO8859-1 -t UTF-8 < de_CH_frami.aff > swiss.aff
Change in the file swiss.aff the line SET ISO8859-1 into SET UTF-8.
7. Trying to generate a list with Swiss German words:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ unmunch swiss.dic swiss.aff > swiss-wordlist
Unfortunately, the result is not usable. I will have to find a different way. I think that I will use swiss.dic as source. Unfortunately, in this file swiss.dic a lot of nouns are written in lower-case (in the German language, nouns are always written in upper-case). Never mind, this has to be fixed later.
8. Add <lexicon> at the beginning of the file swiss.dic. Add </lexicon> at the end of swiss.dic.
9. Create XML file:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ saxonb-xslt -s:swiss.dic -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:swiss.xml
10. Remove substring after slash:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ saxonb-xslt -s:swiss.xml -xsl:'substring-before-slash.xsl' -o:swiss-ssml.xml
11. Generate Swiss eSpeak phonemes:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ espeak -f swiss-ssml.xml -m -v de -q -x --phonout="swiss-espeak"
12. Open the file swiss-espeak with Geany. Replace the sequence "\n\n " (backslash-n-backslash-n-spacebar) by the sequence "</phoneme>\n<phoneme>" (Use escape sequences):

13. Add <lexicon> at the beginning of the file swiss-espeak. Add </lexicon> at the end.
14. Paste <grapheme> and <phoneme> elements:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ paste swiss-ssml.xml swiss-espeak > swiss-pls
15. Edit swiss-pls with Geany so that it will become a valid PLS dictionary.
16. Convert some eSpeak phonemes into IPA phonemes:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ saxonb-xslt -s:'swiss-pls' -xsl:'http://spirit.blau.in/simon/files/2010/05/ralfs-ipa-stylesheet.xsl' -o:'swiss-dictionary.xml'
17. It is necessary that I improve create-ralfs-ipa-stylesheet.xsl. At the moment, there are several German phonemes that aren’t converted. Taking a look into this script. Why am I doing this? At the moment, ralfs-ipa-stylesheet.xsl doesn’t contain almost none eSpeak to IPA conversion rules. Here are the XPath expressions that have to be specified for the German language:
matches(/lexicon/@xml:lang, 'de')
replace($espeak2ipa, '3', '3')
replace($espeak2ipa, '@', 'ə')
replace($espeak2ipa, '@-', 'ə-')
replace($espeak2ipa, 'a', 'a')
replace($espeak2ipa, 'A', 'A')
replace($espeak2ipa, 'A:', 'A:')
replace($espeak2ipa, 'aI', 'aI')
replace($espeak2ipa, 'aU', 'aU')
replace($espeak2ipa, 'E', 'ɛ')
replace($espeak2ipa, 'E2', 'ɛ2')
replace($espeak2ipa, 'E:', 'ɛ:')
replace($espeak2ipa, 'e:', 'eː')
replace($espeak2ipa, 'EI', 'ɛɪ̯')
replace($espeak2ipa, 'I', 'I')
replace($espeak2ipa, 'i2', 'i2')
replace($espeak2ipa, 'i:', 'iː')
replace($espeak2ipa, 'O', 'O')
replace($espeak2ipa, 'o:', 'oː')
replace($espeak2ipa, 'OY', 'OY')
replace($espeak2ipa, 'U', 'U')
replace($espeak2ipa, 'u:', 'uː')
replace($espeak2ipa, 'W', 'W')
replace($espeak2ipa, 'y', 'y')
replace($espeak2ipa, 'y:', 'y:')
replace($espeak2ipa, 'Y:', 'Y:')
replace($espeak2ipa, '\*', '*')
replace($espeak2ipa, ':', ':')
replace($espeak2ipa, ';', ';')
replace($espeak2ipa, 'b', 'b')
replace($espeak2ipa, 'C', 'C')
replace($espeak2ipa, 'd', 'd')
replace($espeak2ipa, 'D', 'D')
replace($espeak2ipa, 'dZ', 'dZ')
replace($espeak2ipa, 'f', 'f')
replace($espeak2ipa, 'g', 'g')
replace($espeak2ipa, 'g#', 'g#')
replace($espeak2ipa, 'h', 'h')
replace($espeak2ipa, 'j', 'j')
replace($espeak2ipa, 'k', 'k')
replace($espeak2ipa, 'l', 'l')
replace($espeak2ipa, 'm', 'm')
replace($espeak2ipa, 'n', 'n')
replace($espeak2ipa, 'N', 'N')
replace($espeak2ipa, 'p', 'p')
replace($espeak2ipa, 'pF', 'pF')
replace($espeak2ipa, 'r', 'r')
replace($espeak2ipa, 's', 's')
replace($espeak2ipa, 'S', 'S')
replace($espeak2ipa, 't', 't')
replace($espeak2ipa, 'tS', 'tS')
replace($espeak2ipa, 'ts', 'ts')
replace($espeak2ipa, 'v', 'v')
replace($espeak2ipa, 'w', 'w')
replace($espeak2ipa, 'x', 'x')
replace($espeak2ipa, 'z', 'z')
replace($espeak2ipa, 'Z', 'Z')
I won’t do a direct conversion from eSpeak phonemes to SAMPA phonemes. I want a conversion from eSpeak phonemes to IPA phonemes. During the PLS simon import process, the IPA phonemes will be transformed into SAMPA phonemes.
18. Do you see the XPath expression replace($espeak2ipa, 'tS', 'tS')? I think that maybe this indicates the voiceless alveolar affricate (e.g. "zehn" [t͡seːn]). Am I right? Or am I wrong? There is another XPath expression: replace($espeak2ipa, 'ts', 'ts'). At the moment, I am not sure which one indicates the voiceless alveolar affricate. Probably, eSpeak [ts] stands for IPA [t͡s]. And probably, eSpeak [tS] stands for the voiceless palato-alveolar affricate [t͡ʃ]. This means that I can add the following XPath expressions to create-ralfs-ipa-stylesheet.xsl:
replace($current-ipa, 'tS', 't͡ʃ')
replace($current-ipa, 'ts', 't͡s')
By the way, the current version of Ralf's German dictionary doesn’t contain the phones [t͡s] and [t͡ʃ]. Maybe I will add both phones to the next version of Ralf's German dictionary. At least, I will add the [t͡s] phone.
19. I hope that you understand the strength of create-ralfs-ipa-stylesheet.xsl: I add the XPath expression replace($current-ipa, 'ts', 't͡s') to create-ralfs-ipa-stylesheet.xsl. This will influence PLS dictionaries that have one of the following xml:lang language codes: ca (Catalan), hu (Hungarian), de (Standard German, Swiss German, Austrian German), el (Greek), eo (Esperanto), hbs (I haven’t created a PLS dictionary with this language code), hy (Armenian – no PLS dictionary; I can use this spelling dictionary), it, lv, mk, pl, pt (espeak offers pt and pt-pt; probably both dialects use the same phone set), ru, sk, sq (Albanian – this is a dictionary that I should create using this spelling dictionary), and many more.
So I am adding the XPath expression replace($current-ipa, 'ts', 't͡s') once, and more than 10 PLS dictionaries will be affected.
20. Let’s be more specific, and take a look into my Swiss German PLS dictionary with eSpeak phonemes:
<lexeme>
<grapheme>Abklatsch</grapheme>
<phoneme>_!'apkl,atS</phoneme>
</lexeme>
And now, take a look at my “flagship” Ralf’s German dictionary:
<lexeme role="Substantiv">
<grapheme>Abklatsch</grapheme>
<phoneme>ʔapklatʃ</phoneme>
</lexeme>
The final result of both dictionaries (Ralf's Swiss German dictionary and Ralf's German dictionary) should be: <phoneme>ʔapklat͡ʃ</phoneme>. I will achieve this goal by adjusting create-ralfs-ipa-stylesheet.xsl for Swiss German (de-CH). In contrast, Ralf's German dictionary (de-DE) contains already IPA phonemes. For this Standard German dictionary, I am using a specific .xsl style-sheet.
21. You can see that I am working as abstract as possible (one .xsl style-sheet for all eSpeak languages), and as concrete as necessary. The phoneme adjustments are done at the appropriate level:
a. Abstract style-sheet for all eSpeak languages: create-ralfs-ipa-stylesheet.xsl
b. Concrete adjustments for a specific eSpeak language can be done here: ralfs-ipa-stylesheet.xsl
c. Fine-Tuning for Standard German: improve-german-dictionary.xsl
A native speaker from Switzerland could develop a specific .xsl style-sheet for the fine-tuning of the Swiss German PLS dictionary.
22. Adding the XPath expression replace($current-ipa, 'E:', 'ɛː') to create-ralfs-ipa-stylesheet.xsl.
23. What should I do with the expression replace($espeak2ipa, 'OY', 'OY')? I should follow the Wiktionary:
[ɔɪ̯] U+0254, U+026A, U+032F Heu /[hɔɪ̯]/, Läufer /[ˈlɔɪ̯fɐ]/, neu /[nɔɪ̯]/
The simon import process accepts the phone [ɔɪ̯]. According to the Wikipedia, the following transcription would be possible:
Instead of the transcription /ɔ͡ʏ/, the transcription /ɔ͡ɪ/ is used as well.
You can see that there are several possible solutions. We have to decide which solution we want to use for a specific language. For Standard German, we use [ɔɪ̯]. It may be possible to use for different languages different transcriptions for the diphtong [ɔɪ̯].
24. And now, I learned something:
Diphthongs in German:
* [aɪ̯] as in Reich ‘empire’
* [aʊ̯] as in Maus ‘mouse’
* [ɔʏ̯] as in neu ‘new’
* [eːɐ̯] as in sehr ‘very’
* [iːɐ̯] as in dir ‘you (dative)’
* [oːɐ̯] as in Bor ‘boron (element)’
* [øːɐ̯] as in Öhr ‘eye (hole in a needle)’
* [uːɐ̯] as in nur ‘only’
* [yːɐ̯] as in Tür ‘door’
Some diphthongs in Bernese, a Swiss German dialect:
* [iə̯] as in Bier ‘beer’
* [yə̯] as in Fuß ‘feet’
* [uə̯] as in Schue ‘shoes’
* [ou̯] as in Stou ‘holdup’
* [au̯] as in Stau ‘stable’
* [aːu̯] as in Staau ‘steel’
* [æu̯] as in Wäut ‘world’
* [æːu̯] as in wääut ‘elects’
* [ʊu̯] as in tschúud ‘guilty’
Great. Now we are coming to a more specific level: Bernese German (Bärndütsch). You can see: Bernese German has specific diphtongs. It is necessary to develop a Bernese German PLS dictionary. We go the following way:
- First, I develop Ralf's German dictionary for Standard German.
- Second, I develop Ralf's Swiss German dictionary for Switzerland.
- Third, someone who lives “in the Swiss plateau (Mittelland) part of the canton of Bern” should develop a Bernese German PLS dictionary.
25. Let me make one thing clear: If you speak Bernese German, you should give Ralf's German dictionary a try. Use my flagship dictionary for training of a few Bernese German words. It is necessary that you understand the concept. But of course, Standard German is different from Bernese German. For good recognition results, it is necessary that you have your own PLS dictionary with the Bernese specific diphtong [yə̯]. At the moment, there is no Bernese German PLS dictionary available (as far as I know). So you should use Ralf's Swiss German dictionary or Ralf's German dictionary.
In the long run, we need specific dialect dictionaries for the Swiss German language:
- Basel German (Baslerdüütsch) PLS dictionary
- Walliser German (Wallisertiitsch) PLS dictionary
- Walser German (Walserdeutsch) PLS dictionary
- Zürich German (Züritüütsch) PLS dictionary
For good recognition results, the PLS dictionary has to match your dialect. And this is why I am a fan of the IPA: You can be as specific as necessary. And we have a common standard that is applicable to all languages. So we use standards (UTF-8, PLS, IPA, XML, XSLT, GPLv3) that are recognized worldwide. And for each Swiss German dialect, there is a solution that can be developed.
26. I am adding the new phoneme /p͡f/ with the XPath expression replace($current-ipa, 'pF', 'p͡f'). It is similar to the voiceless labiodental affricate:
German has a similar sound in Pfeffer [ˈp͡fɛfˑɐ] (‘pepper;) and Apfel [ˈapˑ͡fl̩] (‘apple’). This /p͡f/ only occurs word-initially and behind short vowels, though it differs from a true labiodental affricate in that it starts out bilabial but then the lower lip retracts slightly for the frication.
Question: do we need this phoneme /p͡f/?
27. OK, let’s create the style-sheet:
ubuntu@ubuntu-desktop:~$ saxonb-xslt -s:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/ralfs-phonemes.xml' -xsl:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/create-ralfs-ipa-stylesheet.xsl' -o:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/ralfs-ipa-stylesheet.xsl'
28. And now, let’s transform the eSpeak phonemes into IPA phonemes:
ubuntu@ubuntu-desktop:~/Documents/201005/swiss-german/de_CH_frami$ saxonb-xslt -s:'swiss-pls' -xsl:'/home/ubuntu/Documents/201005/dict-phonemes-espeak2ipa/ralfs-ipa-stylesheet.xsl' -o:'swiss-dictionary.xml'
29. Download Ralf's Swiss German dictionary, and import it into simon.
Left column: Swiss German words. Unfortunately, a lot of nouns are written in lower-case.
Right column: Corresponding SAMPA phonemes.
You can imagine the following problem: The phonemes l aU tS p R E C ah b O k s are not perfect. Take a look at the corresponding entry in the PLS dictionary:
<lexeme>
<grapheme>Lautsprecherbox</grapheme>
<phoneme>l'aʊtʃpʀɛçɐb,ɔks</phoneme>
</lexeme>
I am not sure whether the phonemes /t/ and /ʃ/ should be treated as one single phoneme /t͡ʃ/, or not (see above 18.).
30. I changed the code of Ralf's Swiss German dictionary:
<lexeme>
<grapheme>Lautsprecherbox</grapheme>
<phoneme>l'aʊt͡ʃpʀɛçɐb,ɔks</phoneme>
</lexeme>
I don’t know how the result will look like when I import this into simon. I tested it. It is no difference for the end-user.