I want to import prompts into simon. Let’s say, I want to import the German prompts 01. The following problems occur:
1. Convert flac to wav. I will do this with the following command: $ for f in *.flac; do sox "$f" -t wav -r 16000 -s -c 1 "converted/${f%.flac}.wav"; done
Problem solved.
2. Convert the SSML file into a normal prompts file (HTK format). I will solve this with an XSLT stylesheet. Problem solved.
3. One problem still remains: What is to do with the capital letters at the beginning of each sentence? I don’t want to add the following bold marked words to the dictionary:
Hast du mich verstanden?
Ich habe dich jetzt verstehen können.
Der Schmerz wird mit der Zeit nachlassen.
How could this issue be solved? I don’t want to convert the words at the beginning of each sentence into lowercase manually.
Maybe it would be possible to do some kind of matching? This matching could look like this: If a word in a prompts file is capitalized at the beginning of a sentence (Hast, Ich , Der), then check whether an uncapitalized version of this word is available in the dictionary (hast, ich, der).
I want to train several thousand sentences with simon/sam. These are normal sentences with capitalized letters. How is it possible to uncapitalize a letter at the beginning of a sentence if it is not a noun?






