Currently, the German PLS lexicon contains about 8000 entries. From my point of view, this lexicon is big enough to be used to generate a much bigger lexicon automatically. The goal could be to generate a lexicon that is ten times bigger. That would mean about 80.000 entries.
Let me give you an example:
Baustellen baʊ̯ʃtɛlən
Stellen ʃtɛlən
feststellen fɛstʃtɛlən
festzustellen fɛstsʊʃtɛlən
herstellen heːɐ̯ʃtɛlən
herzustellen heːɐ̯tsʊʃtɛlən
stellen ʃtɛlən
You can see that there are 7 entries which contain ʃtɛlən. We need more of them, e.g. bestellen, zustellen, aufstellen, einstellen, ausstellen, vorstellen. Why not generate them automatically?
The PLS dictionary is published under the GPL. That means that it would be allowed to expand it with Sequitur G2P. Well, you need to install Python (was already installed on my Ubuntu machine), NumPy, and Swig (I just installed this program with the command sudo apt-get install swig).
I just read that Sequitur G2P obviously is using the Expectation Maximazation (EM) Algorithmus (PDF, page 21). There is an entry in the Wikipedia about the expectation-maximization (EM) algorithm. I think that Sequitur G2P could be very helpful.
The words that are not contained in the PLS dictionary are something that could be described as HMM. So it should be possible to compute much more words. Maybe there is someone out there who would want to help?
I just downloaded numpy-1.3.0.tar.gz. But I don’t know how to install it.
By the way, Timo mentioned Sequitur G2P.