In this article, I will explain how to import Ralf’s German dictionary into simon, and you will read about some of the properties of this dictionary.
1. Select Applications > Universal Access > simon.
2. Press the Word list button.
3. Press Import Dictionary.
4. You can select the target: shadow dictionary or active dictionary. What is the right choice? For dictionary development, I often choose active dictionary (so that I have a dictionary in HTK compatible format which I use in conjunction with sam). But let’s now choose the shadow dictionary as target.
5. Press the Next > button.
6. You can choose between different lexicon types: Hadifix, HTK, PLS, and Sphinx. Select PLS.
7. Press the Next > button.
8. You are now here: http://script.blau.in/xml/german.xml
9. Save Page As... doesn’t work. I just tried that. If you choose this option, the page will be saved as html file. You have to choose a different way.
10. Select View Page Source.
11. You can now see the source of the page http://script.blau.in/xml/german.xml.
12. The encoding of the page is UTF-8. This encoding ensures that even languages like Hebrew can be processed correctly. You can imagine that UTF-8 is a very good standard for all languages.
13. Let’s take a look at the address of style sheet http://script.blau.in/xml/ralf-german-dictionary.xsl. This style sheet document changes the appearance of Ralf’s German dictionary when you view it with Firefox.
14. The license is GPL. It would be great if someone would expand the German dictionary.
15. The dictionary has a specific tree structure using the elements lexicon, lexeme, grapheme, phoneme.
16. Select Save Page As....
17. Choose the location of Ralf’s German dictionary that you downloaded a few moments ago. On my computer, the XML file is located here: /home/liberty/200909/german.xml.
18. Ralf’s German dictionary has been imported successfully.
19. Press the Finish button.
20. To take a look at the imported dictionary, select Include unused words from the shadow lexicon.
21. Drag and drop the word Maschine into the white area.
22. You want to train the word Maschine.
23. Press the button to start with the training.
24. Currently, the word Maschine is just part of the shadow lexicon. It is not part of the active lexicon. Press Yes to add it to the active lexicon.
25. You want to define the pronunciation of the word Maschine. The pronunciation is being displayed in SAMPA.
26. I find the concept with the terminals difficult, it is explained in the simon handbook. I am using the terminal Unknown.
OK, I am finishing here. If you want to know more about simon, please read in the simon handbook (PDF).
You should now be able to import Ralf’s German dictionary into simon.















