Posts Tagged ‘PLS’

Feature request: multiple languages

Sunday, August 9th, 2009

I would like to be able to switch between the following languages: English, German, French, and Spanish. I have prepared a small sample PLS pronunciation dictionary for the French language:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!– This pronunciation lexicon is licensed under the GPL. –>
<lexicon version=”1.0″ alphabet=”ipa” xml:lang=”fr”>
<lexeme>
<grapheme>hambourg</grapheme>
<phoneme>ʔɑ̃.buʁ</phoneme>
</lexeme>
<lexeme>
<grapheme>manuscrit</grapheme>
<phoneme>ma.nys.kʁi</phoneme>
</lexeme>
<lexeme>
<grapheme>voiture</grapheme>
<phoneme>vwa.tyʁ</phoneme>
</lexeme>
<lexeme>
<grapheme>prophète</grapheme>
<phoneme>pʀɔfɛt</phoneme>
</lexeme>
<lexeme>
<grapheme>danger</grapheme>
<phoneme>dɑ̃.ʒe</phoneme>
</lexeme>
<lexeme>
<grapheme>heureuse</grapheme>
<phoneme>œ.ʁøz</phoneme>
</lexeme>
<lexeme>
<grapheme>heureusement</grapheme>
<phoneme>œ.ʁøzəmɔ̃</phoneme>
</lexeme>
<lexeme>
<grapheme>heureuses</grapheme>
<phoneme>œ.ʁøz</phoneme>
</lexeme>
<lexeme>
<grapheme>heureux</grapheme>
<phoneme>œ.ʁø</phoneme>
</lexeme>
<lexeme>
<grapheme>dangereuse</grapheme>
<phoneme>dɑ̃ʒʀøz</phoneme>
</lexeme>
<lexeme>
<grapheme>dangereux</grapheme>
<phoneme>dɑ̃ʒʀø</phoneme>
</lexeme>
<lexeme>
<grapheme>manteau</grapheme>
<phoneme>mɑ̃.to</phoneme>
</lexeme>
<lexeme>
<grapheme>manteaux</grapheme>
<phoneme>mɑ̃.to</phoneme>
</lexeme>
</lexicon>

This lexicon could be saved as .xml file, and then imported into simon. But the situation is as follows: I have already imported the German PLS dictionary into simon (it works well). And of course, I don’t want to mix the German dictionary with the sample French dictionary. So it would be good if it would be possible to switch between different languages.

With Dragon NaturallySpeaking 9 Preferred (Win XP), I can switch between German and English. With Vista Speech Recognition (Ultimate Edition), I could switch between French and Spanish (I don’t use Vista anymore – I try to migrate directly to Ubuntu). It would be good if a future version of simon offered the possibility to switch between several languages. That means that different dictionaries, different prompts, different wav training samples, and different mfc files would have to be managed by simon.

Sequitur G2P could expand PLS dictionary

Wednesday, July 22nd, 2009

Currently, the German PLS lexicon contains about 8000 entries. From my point of view, this lexicon is big enough to be used to generate a much bigger lexicon automatically. The goal could be to generate a lexicon that is ten times bigger. That would mean about 80.000 entries.

Let me give you an example:

Baustellen baʊ̯ʃtɛlən
Stellen ʃtɛlən
feststellen fɛstʃtɛlən
festzustellen fɛstsʊʃtɛlən
herstellen heːɐ̯ʃtɛlən
herzustellen heːɐ̯tsʊʃtɛlən
stellen ʃtɛlən

You can see that there are 7 entries which contain ʃtɛlən. We need more of them, e.g. bestellen, zustellen, aufstellen, einstellen, ausstellen, vorstellen. Why not generate them automatically?

The PLS dictionary is published under the GPL. That means that it would be allowed to expand it with Sequitur G2P. Well, you need to install Python (was already installed on my Ubuntu machine), NumPy, and Swig (I just installed this program with the command sudo apt-get install swig).

I just read that Sequitur G2P obviously is using the Expectation Maximazation (EM) Algorithmus (PDF, page 21). There is an entry in the Wikipedia about the expectation-maximization (EM) algorithm. I think that Sequitur G2P could be very helpful.

The words that are not contained in the PLS dictionary are something that could be described as HMM. So it should be possible to compute much more words. Maybe there is someone out there who would want to help?

I just downloaded numpy-1.3.0.tar.gz. But I don’t know how to install it.

By the way, Timo mentioned Sequitur G2P.

English pronunciation dictionaries

Tuesday, July 21st, 2009

You can get English pronunciation dictionaries from Voxforge. Probably, you can import them into simon. But it is important to mention that the ISIP.tgz and the cmu.tgz contain slightly different phonemes:

“The pronunciation dictionary used in the Tutorial and How-to is based on the ISIP Switchboard corpus (contains around 27,500 words). Whereas the QuickStart and nightly AM builds is based on version 0.6 of the CMU Pronunciation Dictionary (contains around 130,000 words). Unfortunately, the Switchboard and CMU pronunciation dictionaries use slightly different phoneme syntax. This is enough to make them incompatible from a Grammar and Acoustic Model testing perspective.”

This is very confusing. And because I want to avoid confusion, I prefer a completely different approach: IPA/PLS. Of course, this adds a layer of complexity (because Sphinx or HTK require ASCII phonemes, not IPA phonemes). But who knows the difference between Switchboard (I don’t even know who or what Switchboard is) and CMU phonemes? Almost nobody. Read the quote again, isn’t it frustrating?

Fortunately, simon is capable to import PLS dictionaries. Sometimes, there can occur problems, e.g. with the phones ɛ (gɛrtnɐ) = æ (gæŋə).

There is a question in the Voxforge forum:

“3) There are many dictionary formats that
appear to have just grown from each project.
I am considering a standard feature rich format
from which existing formats may be extracted.
Is there any reference to this in the literature?”

I just found out What is the VoxForge phoneset?

From my point of view, it is a good idea to use IPA/PLS, and let simon convert the phonemes automatically. Think about the future: what if you want to add dialect phonemes? You can use the IPA for that, too. Not everyone speaks Standard German, Austrian German, or American English. You can have one single PLS dictionary that contains American / British / Australian English – and you can select via XPath the nodes that match best your specific English dialect. How could you achieve that goal e.g. with cmudict? We need solutions for the future. Why not trust the W3C who recommends the PLS?

Some advantages of the PLS:
1. Multiple pronunciations for the same orthography
2. Multiple orthographies
3. No problem with special characters (German äöüß). Even languages like Hebrew, Chinese dialects, Russian, etc. shouldn’t be a problem. Why not? Because of UTF-8. ASCII is OK for the English speaking world. But what is with the rest of the world? The German language contains just a few special characters. But they appear pretty often. It can get pretty messed up when you mix documents that are encoded in different standards. I want to avoid that mess.

I would like to transform the German PLS dictionary with XPath, e.g.
- filter out words that contain special characters ä,ö,ü,ß;
- transform gæŋə into gɛŋə (phone reduction) – this work could be done before importing the lexicon into simon.

I may have said that before. I hope that you can tolerate a little bit of redundancy. What would be the advantage of PLS/XPath? Answer: it would be very easy to understand, and the implementation should be pretty easy even for languages like Hebrew, Asian languages. So I think it is OK to add a layer of complexity (by introducing PLS/IPA).

So I can say that the PLS is a very good choice for languages with lots of special characters. The German PLS dictionary may serve you as example. I would encourage people whose writing systems differ substantially from the English writing system to take a look at the PLS standard.

The first step is to create a lexicon that follows the PLS standard. This is a lot of work. We have done that for the German language; the lexicon contains about 8000 entries. Such a lexicon can be imported into simon, and it works – watch the video. This video is the proof that you can design a PLS dictionary in your own language (using IPA symbols), and use it for speech recognition. This should work for every language. But of course, someone has to create such a PLS dictionary first.

Ampersand (g & N @) could be compiled

Monday, July 20th, 2009

I just added the word Gänge to my active vocabulary:

gaenge-ampersand

It was successful. Obviously, the ampersand could be compiled. simon recognized the words Gänge and Gärtner correctly (OK, first it transcribed geringe instead of Gänge – my speech model needs more training).

Let’s take a look into /home/liberty/.kde/tmp-liberty-desktop/simond/a/compile/lexicon:

GYMNASIUMS [Gymnasiums] g Y m n a: z i U m s
GÄNGE [Gänge] g & N @
GÄRTNER [Gärtner] g E r t n @ r
GÄSTEN [Gästen] g E s t n=

Maybe it would be better if simon would treat the different kinds of ä as if they were the same?

ɛ (gɛrtnɐ) = æ (gæŋə)

I am not sure why the PLS dictionary differentiates between ɛ and æ. Do we need this kind of differentiation?

‘Frieden’ in the PLS dictionary

Saturday, July 18th, 2009

I just wanted to add the word Frieden to the active vocabulary:

frieden

1. Define word Frieden.
2. There are two pronunciations available.
3. Apparantly, there is just one pronunciation in the German PLS dictionary. Why is that? Exactly this PLS dictionary had been imported into simon. Why does simon display two pronunciations while only one pronunciation is beeing displayed in the browser?

Answer: take a look at the source code of the PLS dictionary:

 <lexeme>
  <grapheme>Frieden</grapheme>
  <phoneme>fʀiːdən</phoneme>
  <phoneme>fʀiːdn̩</phoneme>
 </lexeme>

To learn more about multiple pronunciations, please read Example 2.

And now, let’s take a look into the file /home/liberty/.kde/share/apps/simon/model/shadow.voca:

[...]
Friedhof f r i: t h o: f
Friedens f R i: d n= s
Friedens f R i: d @ n s
Frieden f R i: d n=
Frieden f R i: d @ n
Friede f R i: d @
freuten f r O I t n=
[...]

You can see that the shadow vocabulary contains two pronunciations for Frieden. The n= is treated as one single node. The @ n are treated as two single nodes. The n= is similar, but not identical to the n. When writing the PLS dictionary, I distinguish between both kinds of n/. But to be honest: I am not so sure why we differentiate between them.

Confusing: Category, Terminal, Type

Sunday, May 17th, 2009

I have some remarks about the terminology simon uses. Let me explain it with a screen shot.

terminal-category

1. The word that has been trained before is ‘Aachen’.
2. The word that I am about to add is ‘Haus’.
3. This field is called Pronounciation.
4. Why isn’t the field SAMPA called Pronounciation, too? Even though I know what is meant, I find this a little bit confusing.
5. Now we come to the most important part of this blog post. The Category is ‘Unknown’. OK, the word has been imported from the PLS dictionary which doesn’t provide this information. But I found it hard to understand what is meant. I always knew that this was about grammar or so.
6. Here simon speaks about Terminal. This is obviously the same as in (5) the word Category.
7. And here the same thing is called differently: Type

So what I want to say: Why is simon using different expressions for the same things? This is confusing. The terminology for (3) and (4) should be the same. The terminology for (5), (6), and (7) should be the same.

Let me add that I never requested via email the BOMP / HADIFIX dictionary. And please don’t misunderstand the following question: what is wrong with the PLS standard? When the grammatical category is obviously important for speech recognition, why didn’t the developers of the PLS standard include a tag called <terminal>?

And let me add something else: I find the term ‘terminal’ pretty confusing. It has several meanings. It may mean suffix, or terminal symbol. For a beginner, this is confusing. Fortunately, the simon handbook explains:

“Terminal
(Grammatical category; For example: “Noun”, “Verb”, etc.)”

Great. It is so easy. You can see how important a good documentation is. I don’t want to guess what is meant. I just want to read a description or explanation. Additionally, I find screen casts very helpful to get started. Unfortunately, I didn’t find out how to comfortably create a screen cast on my Ubuntu computer (under Windows, I have Wink – works pretty well).

Describing the world with PLS/SSML

Friday, May 15th, 2009

I just read in the simon blog about XML standards. I want to reply to some of the remarks:

“this might be interesting to other readers”

I agree. That’s the reason why I started blogging about simon. I want to give some feedback to the developers. And maybe other people might be interested as well. The people need to know that simon is a project with a very high potential: open source speech recognition for the masses might become true in the near future.
This is important to know for large corporations and governments as well: should they continue to use Win XP, or should they upgrade to Win Vista (or the upcoming Windows 7)? One aspect of this decision can be: is there a speech recognition available or not? Windows Vista does have built-in speech recognition. And Ubuntu Linux? It doesn’t offer any ASR at the moment that would be sufficient. But that could change – hopefully in the not so far future – thanks to simon. So my goal is to influence decisions.

“simon does support importing PLS dictionaries”

That’s great. Why am I so into XML-based standards? Because I understand them. And I want to produce something that is of great value for others (not limited to the ASR development). Even search engines should be capable to understand what is meant when I am offering SSML files. But does a search engine understand what is meant when it analyzes files that are in the HTK or Julius format? I doubt that. HTK and Julius formats are obviously very specific standards just for ASR developers. But I am thinking in a more general sense.

Let me explain what I do believe in: The world is a giant global graph: “I’ll be thinking in the graph. My flights. My friends. Things in my life.” – the inventor of the WWW says that. And I couldn’t agree more. XML is probably the best language to describe this giant graph of knowledge. This is my ideology. If XML doesn’t suit your specific needs with HTK, I understand that.
And, by the way: I don’t like to read SAMPA. I prefer the IPA when editing the pronouncing dictionary. Sometimes, I ask myself the question: why don’t they switch from SAMPA to the IPA? Why don’t they switch their homepage from ISO-8895-1 to UTF-8? OK, they are Americans. They don’t have problems with exotic characters like “äöüß”. Do they care about other languages? Probably not. We don’t live in the time of old-fashioned ASCII any more. There are more spoken languages in the world than just English. The English speaking developers may be comfortable with ASCII. A lot of modernizations would be useful (ASCII->UTF-8; HTK format->PLS; Voxforge prompts->SSML). I can’t criticize the simon developers for that. It is not their fault.

“time constraints”

I understand that there are priorities.

“export functionality is a low priority feature”

OK. From my point of view, Voxforge needs an export functionality. And the export could be done via SSML/XML (<speak> and <audio> elements). The question is: how can I train the speech collected by Voxforge with simon? My proposition is to use SSML as intermediate step. This is additional work in the short term, but in the long term we might increase our productivity.

PLS and SSML are developed by speech experts. And currently, I am convinced that it is not a wrong decision to stick to these standards. I read in the HTK book – it takes a lot of time to get involved.

“PLS standard does not allow for any terminal information”

We could add terminal information, and create a standard XML file with the tags <lexeme><grapheme><phoneme><terminal>. Maybe a future version of the PLS will suit our needs. We can use just XML – and add the missing <terminal> element. I don’t know about the exotic BOMP standard, I couldn’t find an entry in the Wikipedia. So I assume that BOMP is not a relevant standard. I want to use common standards that are well understood outside of the ASR development community. The W3C Speech Interface Framework offers a lot XML-based markup languages. So people who don’t know about the specific needs of HTK/Julius but have a basic understanding of XML can immediately understand what is beeing offered. They don’t have to do lots of research.

I am not very familiar with HTK, and Julius. I tried several times, installed HTK, read the Voxforge tutorial. I made progress, but unfortunately I didn’t achieve sufficient skill to get through with the Voxforge HTK tutorial. Maybe I didn’t try hard enough.

“no reason to introduce new file formats”

Then I will try to develop something on my own. Currently, I am thinking about the question whether we should take a closer look at Symfony to develop an evaluation system for the Voxforge prompts. The result would be that we could deliver high quality training material for simon. By the way, I am primarily interested in dictation (not command and control). And for dictation, we need utterances to get good recognition results. Simon allows me to record just single words, not utterances. I am not convinced by that concept. Training should be done with utterances, not just single words. Voxforge made the right decision to collect utterances.

“importing of a “normal” HTK prompts file”

That would be sufficient. I would appreciate it if such a feature would be implemented.

My proposition is: Voxforge (HTK prompts) -> SSML -> simon
A shorter way would be: Voxforge (HTK prompts) -> simon

Everyone should use the shortest path. But I am thinking about the question: How can we evaluate the Voxforge prompts? Some of them should be sorted out. And how can we achieve this goal?

You see, there are several aspects. The world is not just about simon. It is about Voxforge, too.

“introduce an additional source of errors”

You were capable to implement PLS import. If you don’t want to implement SSML, that would be OK.

I think that I will have to read and try the Voxforge tutorial about HTK again.

It is OK not to focus on PLS export, and SSML. Just do what you think is best for the simon project.

I hope that you understand now my point of view better than before. It is an ideological view – describing the world as a graph. Speech recognition is just a small part of this giant graph. I would like it if Voxforge would offer its prompts in SSML format so that other projects could import the prompts directly. There may be projects out there who focus on speech synthesis. These projects could use the prompts, too.

P.S.: I changed the title of my blog to “testing simon”. Obviously, the developers prefer “simon” over “Simon”.

Taking a look into the lexicon

Tuesday, May 12th, 2009

Let’s take a look into the lexicon. You can download the file, then open it with Notepad++.

model-lexicon

And now, let’s take a closer look.

lexicon-computer

1. This is the path of the lexicon displayed by Notepad++.
2. The word COMPUTER is written in uppercase.
3. Two tabs as separator.
4. The word Computer is written in uppercase (first letter) and in lowercase (rest of the word). It is embraced by square brackets.
5. Again two tabs as separator.
6. Each phoneme of the word is separated by a space as separator.

So, I have some questions:

- Why is the word COMPUTER written in uppercase? Why not mixed letters?
- Why is the word Computer available in brackets? Why two times? What is the value-add?
- Is this lexicon in HTK compatible format?
- Would it be useful to write a script to transform this lexicon into PLS format? Could the ASCII phonemes be transformed into IPA? Which programming language would be suitable? Perl could do the job.
- Who is familiar with the HTK format outside the ASR development community? Probably, almost noone. We should try to have an XML (PLS) export interface. I would like to be able to export my own lexicon in PLS/IPA format.

Those are just my thoughts. Internally, it is obviously necessary to work with the HTK ASCII format. But I would like to be able to export in an XML format. So the exported lexicon could be processed e.g. with XPath.

Ubuntu: Importing the German PLS dictionary

Tuesday, May 5th, 2009

I want to import the current German PLS dictionary. I imported this dictionary before under Win XP. Now I am going to import it under Ubuntu 9.04. By the way, I think that this lexicon is encoded in UTF-8.

import-pls

To accomplish this goal, I had to
(1) click on ‘word list’,
(2) click on ‘Import Dictionary’,
(3) choose the previously downloaded XML PLS dictionary.

I want to use under Ubuntu 9.04 just the German PLS dictionary. I have a different hard drive with Ubuntu 8.10 (will upgrade in the near future) – this is for the English VoxForge dictionary.

Let’s take a look at the imported dictionary:

zustaendig

I filtered the lexeme “zuständig”. You can see that you get several entries. I think that the word “zuständigere” doesn’t exist in the German language. We can live with that for the moment. I know that the dictionary contains some words that don’t exist. It is (partially) my own fault because I submitted some non-existing German words. But keep in mind: The German PLS dictionary has been edited by humans, and the words haven’t been added automatically. So there shouldn’t be too much crap words inside the German PLS dictionary. It’s far from being perfect, but it should help us with the first steps. A lot of words are missing, but the most important words of the German language are implemented (really great work, Timo). When designing the German PLS dictionary we follow Zipf’s law.

By the way, it is intended that this PLS GPL dictionary may be edited to get German dialects (e.g. Austrian German). Personally, I focus on Standard German. Austrian German might be implemented in a separate XML PLS document. I don’t know exactly how it will be done in the future. But it is important to know: The first step is made by creating the German PLS pronouncing dictionary. A second step could be to make it compatible with Austrian German (e.g. by modifying the Standard German dictionary).

Because we are humans, we are using IPA when creating the PLS dictionary. Simon will transform it into ASCII internally. The IPA can be used to create a (separate?) Austrian German lexicon.

I am trying to stick to the following standards: PLS/XML, GPL, UTF-8, IPA. It is great to see that I can use these standards, and obviously simon is capable to handle it.

In the long term, I would like to have a button ‘Export Dictionary’. I don’t need this feature at the moment, but it might be useful. I just removed the word “zuständigere”:

remove

It isn’t possible to export the modified dictionary. Would such a function be useful? For me, it would be useful. But for the vast majority of people problably not.

OK, that’s all for now. I will try again later to train a few words with simon. First, I had to learn to optimize sound under Jaunty Jackalope. It is so complicated and pretty time consuming. But I have learned a lot about sound under Ubuntu. And I am confident that I should find a solution after a few trials.

Importing the new German PLS dictionary

Monday, February 9th, 2009

A few hours ago, a new version of the German PLS dictionary has been released. The size is 670 kB. The previous version had 210 kB. I had imported the previous version into Simon.

dictionary-1.png
The shadow lexicon isn’t displayed, but available

Let’s import the new German PLS dictionary.
dictionary-2.png
Importing a dictionary with Simon

And now, you have to select the type of dictionary. You can choose between HADIFIX, HTK, PLS, and SPHINX. Let’s choose PLS.
dictionary-3.png
Selecting PLS as type of dictionary

You have to download the dictionary. Then you can import it.

dictionary-4.png
Import the PLS dictionary into Simon

Now you have to wait a few moments.

dictionary-5.png
Simon is importing the PLS dictionary

Let’s compare the XML version displayed by Firefox with the imported version.

dictionary-6.png
Compare the entry ‘Aachen’

The grapheme ‘Aachen’ has three possible pronounciations: aːxn̩ – aːxən – aːxŋ. All pronounciations are included in the Simon shadow dictionary in HTK compatible format.

So I am happy to see that we are making some progress. Thanks to Timo for releasing the new PLS dictionary.

kbuildsycoca4.exe; PLS

Sunday, December 28th, 2008

A few hours ago, I tried to install simon-0.2-beta-1-win32.exe. The installation was successful, but I wasn’t able to start Simon. Now I tried a different way. First, I started ‘H:\Program Files\simon 0.2\bin\kbuildsycoca4.exe‘, and then I started Simon 0.2 in the Windows start menu. And now it worked.

Simon version 0.2-beta-1
Simon version 0.2-beta-1

I am interested to know whether it is possible to import the German PLS dictionary (this version has been formatted with XSLT). Let’s take a look at the following screenshot:

it is possible to import a PLS lexicon
It should be possible to import the German PLS lexicon.

In Simon version 0.1, it wasn’t possible to import a PLS lexicon. That is an improvement. After pressing the next button, you are asked to provide the PLS/XML dictionary.
Obviously, other dictionaries provide information about terminals of the words. What does this mean? Why would it be useful to provide terminals?

import a PLS dictionary into Simon
Import a PLS dictionary

In my opinion, it would be useful if such a dictionary would be an integral part of Simon (without the need to download and import it). I want to get started as fast as possible. And I would like to see some results. But it is getting better and better. Version 0.2 seems to be easier to handle than version 0.1. You don’t have to ask for the HADIFIX-BOMP from the University of Bonn. You can just take the German PLS dictionary. Will it work? The German pronunciation dictionary contains a lot of special characters (ä,ö,ü,ß; IPA-symbols). Well, I will see.

Let’s take a look at the encoding of the German PLS dictionary. It is UTF-8. I hope that this character encoding will be OK. Keep in mind: character encoding is a major issue when it comes to non-ASCII characters. I don’t like these problems. But they seem to occur on a regular basis (this applies to languages like Spanish as well).

Well, I think that I will save the German PLS lexicon. I have just done that. Now I have to import it. On my computer, it has the path ‘H:/Documents and Settings/xpprof32/My Documents/200812/german.xml‘. Let’s see whether it will work out. I am now pressing the next button. Simon indicates that the dictionary has been imported successfully.

I just marked the option to include unused words from the shadow lexicon:
include unused words from the shadow lexicon
Obviously, the German PLS dictionary has become the shadow lexicon

And I think that the IPA symbols have been transformed into ASCII characters. That is really great. Why is that great? Because it is pretty comfortable to create a pronunciation dictionary using IPA symbols. They are pretty good readable by the human eye. But the computer needs ASCII characters. Obviously, Simon transforms the IPA symbols into ASCII characters automatically.

Simon indicates that the category is unknown. And at the moment, the recognition rate of all the words is zero. Of course, I haven’t tried to do some recognition yet.

Let’s stop here. I was able to start Simon version 0.2 (after starting kbuildsycoca4.exe). And I was able to import the German PLS dictionary.

Simon 0.2-beta-1 is available for download

Sunday, December 28th, 2008

A few hours ago, Simon 0.2-beta-1 has been released. You can get it for Linux, and for Windows 32-bit. I haven’t tested the new release yet. But I am planning to do so. The new version should be able to import a PLS pronunciation dictionary.