Posts Tagged ‘maʃiːnə’

Ralf’s German dictionary

Saturday, September 12th, 2009

In this article, I will explain how to import Ralf’s German dictionary into simon, and you will read about some of the properties of this dictionary.

universal

1. Select Applications > Universal Access > simon.

import-dictionary

2. Press the Word list button.
3. Press Import Dictionary.

shadow-dictionary

4. You can select the target: shadow dictionary or active dictionary. What is the right choice? For dictionary development, I often choose active dictionary (so that I have a dictionary in HTK compatible format which I use in conjunction with sam). But let’s now choose the shadow dictionary as target.

5. Press the Next > button.

hadifix-htk-pls

6. You can choose between different lexicon types: Hadifix, HTK, PLS, and Sphinx. Select PLS.
7. Press the Next > button.

save-page

8. You are now here: http://script.blau.in/xml/german.xml
9. Save Page As... doesn’t work. I just tried that. If you choose this option, the page will be saved as html file. You have to choose a different way.

page-source

10. Select View Page Source.

lexeme-grapheme

11. You can now see the source of the page http://script.blau.in/xml/german.xml.

12. The encoding of the page is UTF-8. This encoding ensures that even languages like Hebrew can be processed correctly. You can imagine that UTF-8 is a very good standard for all languages.

13. Let’s take a look at the address of style sheet http://script.blau.in/xml/ralf-german-dictionary.xsl. This style sheet document changes the appearance of Ralf’s German dictionary when you view it with Firefox.

14. The license is GPL. It would be great if someone would expand the German dictionary.

15. The dictionary has a specific tree structure using the elements lexicon, lexeme, grapheme, phoneme.

16. Select Save Page As....

import

17. Choose the location of Ralf’s German dictionary that you downloaded a few moments ago. On my computer, the XML file is located here: /home/liberty/200909/german.xml.

finish

18. Ralf’s German dictionary has been imported successfully.
19. Press the Finish button.

maschine

20. To take a look at the imported dictionary, select Include unused words from the shadow lexicon.
21. Drag and drop the word Maschine into the white area.

train-selected

22. You want to train the word Maschine.
23. Press the button to start with the training.

add-maschine

24. Currently, the word Maschine is just part of the shadow lexicon. It is not part of the active lexicon. Press Yes to add it to the active lexicon.

sampa

25. You want to define the pronunciation of the word Maschine. The pronunciation is being displayed in SAMPA.
26. I find the concept with the terminals difficult, it is explained in the simon handbook. I am using the terminal Unknown.

OK, I am finishing here. If you want to know more about simon, please read in the simon handbook (PDF).

You should now be able to import Ralf’s German dictionary into simon.