Import 40.000 Portuguese words

You can import Ralf's Portuguese (European) dictionary (version 0.1; GPLv3) into simon. Training with this dictionary is currently not recommended.

To create this dictionary, I downloaded a spelling dictionary, then generated the phonemes.

This dictionary contains information about the primary stress. This information will be automatically removed when importing the dictionary into simon. From my point of view, we don’t need stress information for ASR.

The issues of this dictionary are similar to the Catalan and Spanish dictionaries. Never mind, at least now I am offering you a PLS dictionary for the Portuguese language.

Language tag dependent import

Maybe it would be interesting to think about language specific dictionary import. E.g., Ralf's Portuguese (European) dictionary contains the following information in the lexicon element: xml:lang="pt-pt"

This means European Portuguese (I am not totally sure whether this language code is correct or not. But if I am wrong, this can be corrected later.). My other dictionaries have the following language tags:

Ralf's French dictionary: xml:lang="fr" (I think that the current version has the wrong language tag; I will check that later.)
Ralf's Spanish dictionary: xml:lang="es"
Ralf's German dictionary: xml:lang="de"
Ralf's Austrian German dictionary: xml:lang="de-AT"

Maybe a future version of simon could transform the IPA phonemes into SAMPA using language specific conversion rules.

Tags:

Comments are closed.