Ralf’s Italian speech model

Some words about the creation of Ralf’s Italian speech model.

1. I took a look at the Italian frequency list. It is licensed under the LGPL – very good.

2. Linux Mint terminal:

sort -u wordlist-italian-1 > wordlist-italian-2
saxonb-xslt -ext:on -s:italian-dictionary.xml -xsl:compare-popular-words.xsl
saxonb-xslt -ext:on -s:popular-italian-words-dictionary.xml -xsl:lexicon2phonemelist.xsl
sort -u phonemelist-1 > phonemelist-2

Speak IPA phoneme list into the microphone.3. Now I can dictate phonemelist-2 into Audacity.

4. Mark the recording with a double-click.

5. Audacity > Analyze > Sound Finder.

6. Set the settings as follows:
Label starting point: 1 second
Label ending point: 1 second
This is the optimal solution. The label starting/ending point has to be one second because Simon will analyze the loudness of the “silence” (or the loudness of the background noise). In my opinion, one second silence before and after the recording is sufficient.

7. You can see that each recorded word is marked by a number.

8. Sometimes, there are sounds which should be removed.
a. There is an area which should be silenced out. It has been marked by a number, but there shouldn’t be a number.
b. Remove the sound with this button.

9. Now let’s export the file labels.txt. Open Audacity > File > Export Labels.

10. Export labels as labels.txt. Press the Save button.

11. Now open labels.txt with the text editor Geany. There are three columns. The last column will be removed.

12. Geany > Search > Replace. Search for: \t\w+$
Use regular expressions. This procedure removes the third number from each line.

13. You can see that the third column has been removed thanks to the regular expression procedure.

14. Linux Mint terminal:

paste labels.txt phonemelist-2 > italian-pasted.txt

15. You can see that the document pasted.txt has a third column: The labels are the phonetic transcriptions!

16. Audacity > File > Import > Labels…

17. Take a look at the result. Each label is a phonetic transcription of the corresponding recording.

18. Audacity > File > Export Multiple…

19. Export format: FLAC. Press the Export button.

20. After a few moments, Audacity displays that the files have been successfully exported.

21. Now let’s create the prompts file:

saxonb-xslt -ext:on -s:popular-italian-words-dictionary.xml -xsl:lexicon2prompts.xsl

I will need it later.

22. Start Simon. I want to create an Italian scenario.

23. Press the Manage scenarios button.

24. Press New.

25. You have to give the scenario a name. Name it Italian. You can change the license to GPLv3. Then press the Add button.

26. Enter your name and a contact. Press OK.

The name of the scenario will be Italian. The license is GPLv3. Author is Ralf Herzog. Press the OK button.

27. The Italian scenario is now available (left column). Move it to the right column.

28. Press the Open “Italian” button.

29. Press the Import dictionary button.

30. Import as Active dictionary.

31. Select the type of the dictionary. Select PLS lexicon. Press the Next button.

32. Provide the downloaded PLS dictionary. Press the blue folder button.

33. The dictionary has been imported successfully. Press the Finish button.

34. Switch to the Grammar tab. Press the button Add sentence.

35. You can now enter the sentence structure.

36. Add the word “Unknown”.

37. Switch to the Commands tab. Press the button Manage plugins.

38. Press the Add button. It is necessary to activate the dictation plugin.

39. Select Dictation.

40. Append text after result. Press the space-bar one time. This means that after each recognized word, there will be a space.

41. Go to Settings > Configure Simon. I want to import FLAC files. It is necessary to tell Simon that I want to import FLAC files (and not wav files).

42. a. Select Recordings.
b. Switch to the Post-Processing tab.
c. Now it is necessary to add the correct command. It is sox -t flac %1 -t wav %2.
d. Then press the Add button. Press Apply. Press OK.

43. Now go to the Training tab. Press the button Import training data.

44. Import training samples: Select the path to the prompts file and the path to the FLAC files. Then press the Next button.

45. Simon is now importing the audio files. During import, they will be converted from FLAC to wav automatically.

46. The import of the folder has been completed. Press the Finish button.

47. Actions > Synchronize, Wait a few moments – Simon is compiling the speech model. Actions > Activate

48. Let’s dictate a few Italian words with Simon 0.3.80:

abbiamo acqua alla ambiato benissimo bianco caffè carino coma colon credibile dando dormire dura esco favore forse libertà importante impossibile incredibile

It is working under Linux Mint.

49. Now I can export the Italian scenario and the Italian base model.

50. Download Ralf’s Italian speech model, and use it with Simon 0.3.80 (it won’t work with Simon 0.3).

Tags: ,

Comments are closed.