Posts Tagged ‘HTK’

HTK website is not reachable

Friday, January 1st, 2010

At the moment, it seems that the HTK website is not reachable. Here is what you can do:
- install simon,
- read the simon handbook,
- wait until the HTK website is available. Probably, it is just a temporary issue. No need to worry.

You can install simon without installing HTK. But to build speech models, you need HTK.

Something for mathematicians who want to learn more about the theoretical background of HMMs in general: read the PDF The Application of Hidden Markov Models in Speech Recognition.

Why should you read the PDF with the theoretical background? Personally, I would understand just about 1% of the things they explain (I read parts of the HTK book – I understood almost nothing). But at least I understand one fundamental thing: HMMs are a great thing for speech recognition. The HTK toolkit handles HMMs. And simon makes use of the HTK toolkit (the HTK toolkit has to be installed separately; it is not part of simon).

In the end, it is working. Watch my newest video where I am dictating 200 German words with simon.

So there are several things that you can do until the HTK toolkit is available again.

Vietnamese Hanoi dictionary (HTK)

Friday, December 11th, 2009

I just imported a Vietnamese Northern (Hanoi) dialect dictionary. You can download the dictionary. I imported it as HTK lexicon. This is the result:

vietnamese-htk

It looks fine. Here is a small excerpt:

mà [mà] m aa2
má [má] m aa3
mả [mả] m aa4
mã [mã] m aa5
mạ [mạ] m aa6
ma [ma] m aa7

If you speak the Vietnamese language, you should get the concept. The different a-vowels are different phonemes (aa2, aa3, aa4, aa5, aa6, aa7). This approach should be OK.

It would be nice if a native speaker would try to record a few Vietnamese words with simon:

record-vietnamese

It would be interesting to know whether it works since Vietnamese is completely different from English. I recommend that you try to record 10 different Vietnamese words with simon (each word 8 times).

Importing the Voxforge dictionary

Monday, August 31st, 2009

I am now importing the Voxforge dictionary into simon from this location: /home/liberty/200908/sam/english/VoxForgeDict. I had downloaded it from here (VoxForge.tgz). It is in HTK format. What does the HTK format look like? Here is a small excerpt from the dictionary:

APPROACH [APPROACH] ax p r ow ch
APPROACHABLE [APPROACHABLE] ax p r ow ch ax b ax l
APPROACHED [APPROACHED] ax p r ow ch t
APPROACHES [APPROACHES] ax p r ow ch ax z
APPROACHES(2) [APPROACHES] ax p r ow ch ix z
APPROACHING [APPROACHING] ax p r ow ch ix ng
APPROBATION [APPROBATION] ae p r ax b ey sh ax n

The VoxForge dictionary contains about 130k words.

First, I imported the /home/liberty/200908/sam/english/cmudict.0.6d. It is in Sphinx format:

APPROACH AH0 P R OW1 CH
APPROACHABLE AH0 P R OW1 CH AH0 B AH0 L
APPROACHED AH0 P R OW1 CH T
APPROACHES AH0 P R OW1 CH AH0 Z
APPROACHES(2) AH0 P R OW1 CH IH0 Z
APPROACHING AH0 P R OW1 CH IH0 NG
APPROBATION AE2 P R AH0 B EY1 SH AH0 N

So you now know the difference between a dictionary that is stored in HTK format and one that is stored in Sphinx format. Both dictionaries – VoxForgeDict and cmudict.0.6d – contain each about 130k words. I don’t know whether they share the same phoneme set or not. My guess is that both lexicons are using CMU-40 but I don’t know, so I could be wrong!

I think that I will stick to VoxForgeDict because it is in HTK format.

So, what will be my next step? I want to train a few words with simon (words: this, is, a, different, approach). Then I will compile the speech model (= synchronize with ksimond). After that, I will try whether simon recognizes my voice.

If it works, I will try to make a test with sam. I want to test the sentence This is a different approach. with sam. This is the first sentence of my English files.

I have to get familiar with the whole training and testing process. In the long term, I want to use sam for model creation and model testing.

I just imported the lexicon VoxForgeDict into simon. Maybe I should define a grammar now? I just did that: Now I have a grammar with just one category: Unknown. I know that this isn’t sufficient for testing the whole sentence This is a different approach., but I will try to fix that later when the problem occurs.

I am adding now the word this:

this

I can’t record the word because I can’t restart simon. I think that I will have to get the current snapshot via svn.

Compiling the svn version (revision 827)

Saturday, May 16th, 2009

Today, I compiled the svn version (revision 827) of simon. This is what I did:
- I removed the simon .deb package with the ‘Computer Janitor‘.
- $ sudo apt-get install cmake
- $ cd /media/Hitachi/speech2text/trunk/
- I took a look into the simon wiki.
- $ sudo apt-get install subversion kdelibs5-dev portaudio19-dev libxtst-dev cmake build-essential bison flex gettext gettext-kde kdeartwork
- liberty@liberty-desktop:/media/Hitachi/speech2text/trunk$ ./build_ubuntu.sh
- It took a while to compile simon, but it worked out very well.

Now I am reading the simon handbook. It says in Chapter 2 / Overview / Architecture:

“simon is used to create and maintain a representation of your pronunciation and language. This representation is then sent to the server simond which compiles it into a usable speech model.”

Which component does interact with HTK? Is it simon, or is it simond? I am a little confused after reading these lines. I would like to know: how does simon interact with the HTK toolkit? Or have they found a replacement for HTK? When using simon 0.1, I had to set paths to different components of HTK. Why isn’t this necessary any more? Probably, there has been a complete redesign of simon when changing from the 0.1 branch to the 0.2 branch.

Describing the world with PLS/SSML

Friday, May 15th, 2009

I just read in the simon blog about XML standards. I want to reply to some of the remarks:

“this might be interesting to other readers”

I agree. That’s the reason why I started blogging about simon. I want to give some feedback to the developers. And maybe other people might be interested as well. The people need to know that simon is a project with a very high potential: open source speech recognition for the masses might become true in the near future.
This is important to know for large corporations and governments as well: should they continue to use Win XP, or should they upgrade to Win Vista (or the upcoming Windows 7)? One aspect of this decision can be: is there a speech recognition available or not? Windows Vista does have built-in speech recognition. And Ubuntu Linux? It doesn’t offer any ASR at the moment that would be sufficient. But that could change – hopefully in the not so far future – thanks to simon. So my goal is to influence decisions.

“simon does support importing PLS dictionaries”

That’s great. Why am I so into XML-based standards? Because I understand them. And I want to produce something that is of great value for others (not limited to the ASR development). Even search engines should be capable to understand what is meant when I am offering SSML files. But does a search engine understand what is meant when it analyzes files that are in the HTK or Julius format? I doubt that. HTK and Julius formats are obviously very specific standards just for ASR developers. But I am thinking in a more general sense.

Let me explain what I do believe in: The world is a giant global graph: “I’ll be thinking in the graph. My flights. My friends. Things in my life.” – the inventor of the WWW says that. And I couldn’t agree more. XML is probably the best language to describe this giant graph of knowledge. This is my ideology. If XML doesn’t suit your specific needs with HTK, I understand that.
And, by the way: I don’t like to read SAMPA. I prefer the IPA when editing the pronouncing dictionary. Sometimes, I ask myself the question: why don’t they switch from SAMPA to the IPA? Why don’t they switch their homepage from ISO-8895-1 to UTF-8? OK, they are Americans. They don’t have problems with exotic characters like “äöüß”. Do they care about other languages? Probably not. We don’t live in the time of old-fashioned ASCII any more. There are more spoken languages in the world than just English. The English speaking developers may be comfortable with ASCII. A lot of modernizations would be useful (ASCII->UTF-8; HTK format->PLS; Voxforge prompts->SSML). I can’t criticize the simon developers for that. It is not their fault.

“time constraints”

I understand that there are priorities.

“export functionality is a low priority feature”

OK. From my point of view, Voxforge needs an export functionality. And the export could be done via SSML/XML (<speak> and <audio> elements). The question is: how can I train the speech collected by Voxforge with simon? My proposition is to use SSML as intermediate step. This is additional work in the short term, but in the long term we might increase our productivity.

PLS and SSML are developed by speech experts. And currently, I am convinced that it is not a wrong decision to stick to these standards. I read in the HTK book – it takes a lot of time to get involved.

“PLS standard does not allow for any terminal information”

We could add terminal information, and create a standard XML file with the tags <lexeme><grapheme><phoneme><terminal>. Maybe a future version of the PLS will suit our needs. We can use just XML – and add the missing <terminal> element. I don’t know about the exotic BOMP standard, I couldn’t find an entry in the Wikipedia. So I assume that BOMP is not a relevant standard. I want to use common standards that are well understood outside of the ASR development community. The W3C Speech Interface Framework offers a lot XML-based markup languages. So people who don’t know about the specific needs of HTK/Julius but have a basic understanding of XML can immediately understand what is beeing offered. They don’t have to do lots of research.

I am not very familiar with HTK, and Julius. I tried several times, installed HTK, read the Voxforge tutorial. I made progress, but unfortunately I didn’t achieve sufficient skill to get through with the Voxforge HTK tutorial. Maybe I didn’t try hard enough.

“no reason to introduce new file formats”

Then I will try to develop something on my own. Currently, I am thinking about the question whether we should take a closer look at Symfony to develop an evaluation system for the Voxforge prompts. The result would be that we could deliver high quality training material for simon. By the way, I am primarily interested in dictation (not command and control). And for dictation, we need utterances to get good recognition results. Simon allows me to record just single words, not utterances. I am not convinced by that concept. Training should be done with utterances, not just single words. Voxforge made the right decision to collect utterances.

“importing of a “normal” HTK prompts file”

That would be sufficient. I would appreciate it if such a feature would be implemented.

My proposition is: Voxforge (HTK prompts) -> SSML -> simon
A shorter way would be: Voxforge (HTK prompts) -> simon

Everyone should use the shortest path. But I am thinking about the question: How can we evaluate the Voxforge prompts? Some of them should be sorted out. And how can we achieve this goal?

You see, there are several aspects. The world is not just about simon. It is about Voxforge, too.

“introduce an additional source of errors”

You were capable to implement PLS import. If you don’t want to implement SSML, that would be OK.

I think that I will have to read and try the Voxforge tutorial about HTK again.

It is OK not to focus on PLS export, and SSML. Just do what you think is best for the simon project.

I hope that you understand now my point of view better than before. It is an ideological view – describing the world as a graph. Speech recognition is just a small part of this giant graph. I would like it if Voxforge would offer its prompts in SSML format so that other projects could import the prompts directly. There may be projects out there who focus on speech synthesis. These projects could use the prompts, too.

P.S.: I changed the title of my blog to “testing simon”. Obviously, the developers prefer “simon” over “Simon”.

“Unable to create a valid backtrace”

Wednesday, May 6th, 2009

I just got the following error message:

backtrace

This happened before: I started simon and ksimond. Then I pressed the Connect button. Probably I pressed the Connect / Activate button another time (I am not sure about that). Probably, simon and ksimond were connected.

A few hours ago, I installed HTK 3.4.1 and HDecode on my machine. But I didn’t create a grammar for simon. And I didn’t synchronize the one word that I have trained so far (it is the word ‘Aachen’). I didn’t define the terminal for ‘Aachen’ yet. It is a noun, I should try to define that later.

Bug fix release: Simon 0.2-beta-2

Thursday, January 15th, 2009

I just saw that a bug fix release is available. I will check that out later.

Currently, I am examining HTK. And I am making some progress. I hope that I will be able to process more than 10.000 German utterances with the HTK toolkit. I would like to be able to process these prompts (format follows the SSML audio element) with Simon. But how could I achieve this goal? I don’t want to use Simon to record my speech. This is too complicated. I prefer Audacity (because I can record about 30 utterances in just one Audacity session – pretty comfortable – just go to Audacity > File > ‘Export Multiple’).

Simon 0.1 has internet extensions. I would like that a future version of Simon does have an internet extension for the SSML audio element so that I can import these 99 utterances directly from the internet.

Selecting English HTK dictionary

Sunday, January 4th, 2009

Simon 0.2 on Ubuntu has a good look and feel. This is my first impression. I now want to import the English HTK dictionary from Voxforge.

English HTK dictionary from Voxforge
Selecting English HTK dictionary

After downloading and extracting VoxForge.tgz, I have the file VoxForgeDict on my desktop. I am now importing the file /home/ubuntu/Desktop/VoxForgeDict. The import was successful.

HTK binaries in an Ubuntu 8.10 environment

Tuesday, November 18th, 2008

Here are the HTK binaries in an Ubuntu 8.10 environment.HTK binaries in an Ubuntu 8.10 environment
You will have to point to the green marked binaries when configuring Simon.

Typing “HVite -V” into the terminal

Monday, November 17th, 2008

I just typed into the terminal HVite -V. The result was that some HTK version information was being displayed.
HVite in the Ubuntu terminal
Everything seems to be OK with my HTK installation.

I found this tip in this how-to:

“Type in “HVite -V” in a Command Console Window;

if your system lists all the options available to the hvite command, then HTK is installed properly.”

Downloading Julius 4.02

Tuesday, November 11th, 2008

I am now downloading Julius 4.02 because I am following theses instructions.

I just learned that there is a “HTK-to-Julius grammar converter”. Maybe I will take a look at the source code.

I just extracted the archive julius-4.0.2.tar.gz using the Archive Manager (under Ubuntu). Now I have to change into the directory.
root@ubuntu-desktop:/home/ubuntu# cd julius-4.0.2
After pressing the enter key, I am now in the correct directory.
root@ubuntu-desktop:/home/ubuntu/julius-4.0.2#
And now, I will type into the terminal ./configure && make. This is the corresponding line in the terminal:
root@ubuntu-desktop:/home/ubuntu/julius-4.0.2# ./configure && make

Obviously, some problems occured:
configure: error: flex library not found! installation terminated
configure: error: ./configure failed for gramtools

I don’t know whether these problems are substantial. And I don’t know how to fix them.

I just downloaded HTK

Tuesday, November 11th, 2008

I just downloaded HTK following this instruction. Currently, I am trying to install Simon under Ubuntu Intrepid Ibex.

Afterwards, I compiled HTK. This was standing in the terminal:
root@ubuntu-desktop:/home/ubuntu/htk# ./configure && make

And then, I typed make install into the terminal. Here is the line from the terminal:
root@ubuntu-desktop:/home/ubuntu/htk# make install

And the result was that in the directory /usr/local/bin/ a lot of executables have been generated. Here are a few names of those executables: HBuild, HCompV, HCopy, HDMan, HERest, HInit, HList, HParse, etc..

I think that some of those executables will have to be linked from Simon. I hope to find that out later.