<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>testing simon &#187; dictionary</title>
	<atom:link href="http://spirit.blau.in/simon/category/dictionary/feed/" rel="self" type="application/rss+xml" />
	<link>http://spirit.blau.in/simon</link>
	<description>my first steps with the simon speech recognition software</description>
	<lastBuildDate>Tue, 10 Jan 2012 14:59:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Ralf&#8217;s Interlingua dictionary</title>
		<link>http://spirit.blau.in/simon/2012/01/10/ralfs-interlingua-dictionary/</link>
		<comments>http://spirit.blau.in/simon/2012/01/10/ralfs-interlingua-dictionary/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 14:59:26 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[dictionary]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5743</guid>
		<description><![CDATA[This article explains how I create the dictionary, and how the imported result looks like in simon. A. Creation of the PLS dictionary: 1. Get spelling dictionary. 2. License is GPL. It says in the file README_en.txt: This spell check dictionary for Interlingua is licensed under GPL. [...] This hyphenation rules for Interlingua are licensed [...]]]></description>
			<content:encoded><![CDATA[<p>This article explains how I create the dictionary, and how the imported result looks like in simon.</p>
<p><strong>A. Creation of the PLS dictionary:</strong></p>
<p>1. <a href="http://extensions.services.openoffice.org/en/project/dict-ia">Get</a> spelling dictionary.<br />
2. License is <a href="http://www.gnu.org/licenses/gpl.txt">GPL</a>. It says in the file README_en.txt:</p>
<blockquote><p>This spell check dictionary for Interlingua is licensed under GPL. [...] This hyphenation rules for Interlingua are licensed under GPL.</p></blockquote>
<p>This means that I can use this spelling dictionary as source.<br />
3. Extract <a href="http://extensions.services.openoffice.org/en/download/4581">dict-ia-2010-11-29.oxt</a>.<br />
4. ISO 639-1 <a href="http://en.wikipedia.org/wiki/Interlingua">language</a> code is <code>ia</code>.<br />
5. Probably I will <a href="http://en.wikipedia.org/wiki/Interlingua#Interlingua_alphabet">use this table</a> for grapheme to phoneme conversion.</p>
<p>6. Check the encoding of ia_iso.aff and ia_iso.dic. Both files are encoded in ISO 8859-1. Probably it is best if I convert the encoding of both files into UTF-8.<br />
<code>iconv -f ISO-8859-1 -t UTF-8 < ia_iso.dic > interlingua-utf8.dic<br />
iconv -f ISO-8859-1 -t UTF-8 < ia_iso.aff > interlingua-utf8.aff</code><br />
Change the first line in interlingua-utf8.aff into SET UTF-8. Both files contain CRLF at the end of each line (Windows mode). I don&#8217;t know whether this is ok with the unmunch command. I will check it out:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ unmunch interlingua-utf8.dic interlingua-utf8.aff > interlingua-wordlist</code></p></blockquote>
<p>Obviously, it worked. The CRLF is part of the source files. The target file contains just a LF (Unix mode). There are a lot of duplicate entries. I think that these duplicate entries will be removed later by an <code>.xsl</code> script.</p>
<p>7. Add lexicon tags at the beginning and the end of interlingua-wordlist.</p>
<p>8. Create XML file:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ saxonb-xslt -s:interlingua-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:interlingua.xml</code></p></blockquote>
<p>9. Create PLS dictionary:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Interlingua$ saxonb-xslt -s:interlingua.xml -xsl:'<a href='http://spirit.blau.in/simon/files/2012/01/improve-interlingua.xsl_.zip'>improve-interlingua.xsl</a>' -o:<a href="http://script.blau.in/interlingua-dictionary.xml.bz2">interlingua-dictionary.xml</a></code></p></blockquote>
<p><strong>B. <a href="http://script.blau.in/interlingua-dictionary.xml.bz2">Download</a> the dictionary. Import it into simon.</strong></p>
<p><a href="http://spirit.blau.in/simon/files/2012/01/interlingua.jpg"><img src="http://spirit.blau.in/simon/files/2012/01/interlingua-293x300.jpg" alt="" title="interlingua" width="293" height="300" class="alignleft size-medium wp-image-5746" /></a>The left column contains the words. The pronunciation column contains the corresponding SAMPA transcriptions. The Category column contains just &#8220;Unknown&#8221; entries.
<div style="clear:both"></div>
<p>Now you know how I created the dictionary and how the result looks like in simon.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2012/01/10/ralfs-interlingua-dictionary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ralf&#8217;s Arabic dictionary</title>
		<link>http://spirit.blau.in/simon/2012/01/10/ralfs-arabic-dictionary/</link>
		<comments>http://spirit.blau.in/simon/2012/01/10/ralfs-arabic-dictionary/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 11:55:50 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[dictionary]]></category>
		<category><![CDATA[Arabic]]></category>
		<category><![CDATA[PLS]]></category>
		<category><![CDATA[saxonb-xslt]]></category>
		<category><![CDATA[sed]]></category>
		<category><![CDATA[unmunch]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5726</guid>
		<description><![CDATA[This article explains the creation of an Arabic PLS dictionary and how the result looks like in simon. A. Creation of the dictionary: 1. Get Arabic spelling dictionary. 2. Check the license. Inside the file dict_ar-3.0.oxt there is a file with the name COPYING (in the docs folder). It says in the file: GPL 2.0/LGPL [...]]]></description>
			<content:encoded><![CDATA[<p>This article explains the creation of an Arabic PLS dictionary and how the result looks like in simon.</p>
<p><strong>A. Creation of the dictionary:</strong></p>
<p>1. <a href="http://extensions.services.openoffice.org/en/project/Arabicspellchecker">Get</a> Arabic spelling dictionary.<br />
2. Check the license. Inside the file <a href="http://extensions.services.openoffice.org/en/download/4955">dict_ar-3.0.oxt</a> there is a file with the name COPYING (in the docs folder). It says in the file:</p>
<blockquote><p>GPL 2.0/LGPL 2.1/MPL 1.1 tri-license</p></blockquote>
<p>This means that I can use this tri-licensed spelling dictionary as source for my future GPLv3 PLS dictionary.</p>
<p>3. Now I have to extract <code>dict_ar-3.0.oxt</code>.<br />
4. Let&#8217;s try the <code>unmunch</code> command inside the Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Arabic$ unmunch ar.dic ar.aff > arabic</code></p></blockquote>
<p>It failed. I wasn&#8217;t able to unmunch the word list.<br />
5. I have to remove all numbers from ar.dic. This can be done with the <code>sed</code> command:</p>
<blockquote><p><code>sed 's/[0-9]*//g' ar.dic > arabic-without-numbers</code></p></blockquote>
<p>6. Remove the slash (&#8220;/&#8221;) from arabic-without-numbers with <a href="http://en.wikipedia.org/wiki/Geany">Geany</a>.<br />
7. Add lexicon tags at the beginning and the end of the file.<br />
8. Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Arabic$ saxonb-xslt -s:arabic-without-numbers -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:arabic.xml</code></p></blockquote>
<p>9. ISO 639-1 language code is ar.<br />
10. Maybe I will <a href="http://en.wikipedia.org/wiki/Romanization_of_Arabic#Comparison_table">use this table</a> for the grapheme to phoneme conversion.<br />
11. Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Arabic$ saxonb-xslt -s:arabic.xml -xsl:'<a href='http://spirit.blau.in/simon/files/2012/01/improve-arabic.xsl_.zip'>improve-arabic.xsl</a>' -o:<a href="http://script.blau.in/arabic-dictionary.xml.bz2">arabic-dictionary.xml</a></code></p></blockquote>
<p>I have to remove the number sign (&#8220;#&#8221;) with Geany from arabic.xml.</p>
<p><strong>B. <a href="http://script.blau.in/arabic-dictionary.xml.bz2">Download</a> the dictionary. Import it into <a href="http://simon-listens.blogspot.com/">simon</a>.</strong></p>
<p><a href="http://spirit.blau.in/simon/files/2012/01/arabic-pronunciation.jpg"><img src="http://spirit.blau.in/simon/files/2012/01/arabic-pronunciation-271x300.jpg" alt="" title="arabic-pronunciation" width="271" height="300" class="alignleft size-medium wp-image-5736" /></a>The left column contains 457089 Arabic words. The pronunciation column contains the corresponding SAMPA transcriptions. The third column contains just entries with &#8220;Unknown&#8221;. This is because the PLS dictionary contains no <code>role</code> attributes.
<div style="clear:both"></div>
<p>Now you know how I created the dictionary. And you know how the result looks like in simon.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2012/01/10/ralfs-arabic-dictionary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ralf&#8217;s Hebrew dictionary</title>
		<link>http://spirit.blau.in/simon/2012/01/10/ralfs-hebrew-dictionary/</link>
		<comments>http://spirit.blau.in/simon/2012/01/10/ralfs-hebrew-dictionary/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 09:43:30 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[dictionary]]></category>
		<category><![CDATA[Hebrew]]></category>
		<category><![CDATA[PLS]]></category>
		<category><![CDATA[saxonb-xslt]]></category>
		<category><![CDATA[unmunch]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5718</guid>
		<description><![CDATA[In 2009, I made some initial tests with Hebrew. Now it is time to develop a Hebrew PLS dictionary that is much bigger than the sample dictionary from 2009 (which I have deleted). This article explains how I create the dictionary, and how the result looks like when imported into simon. A. Creation of the [...]]]></description>
			<content:encoded><![CDATA[<p>In 2009, I made some <a href="http://spirit.blau.in/simon/2009/09/10/confidence-score-with-hebrew/">initial tests with Hebrew</a>. Now it is time to develop a Hebrew PLS dictionary that is much bigger than the sample dictionary from 2009 (which I have deleted). This article explains how I create the dictionary, and how the result looks like when imported into simon.</p>
<p><strong>A. Creation of the dictionary:</strong></p>
<p>1. <a href="http://extensions.services.openoffice.org/en/project/dict-he">Get</a> Hebrew spelling dictionary from OpenOffice.org.<br />
2. License is <a href="http://www.gnu.org/licenses/gpl-2.0.html">GPL</a>. There is a copyright notice inside the file <code>he_IL.aff</code>.</p>
<p>3. I tried to unmunch the dictionary in the Ubuntu terminal, but unfortunately I failed:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ unmunch he_IL.dic he_IL.aff > hebrew-test</code></p></blockquote>
<p>4. The source file <code>he_IL.dic</code> contains a lot of numbers. I remove them with the Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ <a href="http://www.cyberciti.biz/faq/sed-remove-all-digits-input-from-input/">sed</a> 's/[0-9]*//g' he_IL.dic > hebrew-without-numbers</code></p></blockquote>
<p>With Geany, I remove the &#8220;,&#8221; (commas) and the &#8220;/&#8221; (slashes) that still are included within in the file hebrew-without-numbers. Now I have a clean word list with 43.000 Hebrew words.</p>
<p>5. Add lexicon tags at the beginning and the end of hebrew-without-numbers.<br />
6. Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ saxonb-xslt -s:hebrew-without-numbers -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:hebrew.xml</code></p></blockquote>
<p>7. ISO 639-1 <a href="http://en.wikipedia.org/wiki/Hebrew_language">language</a> code is <code>he</code>.<br />
8. I need a table for grapheme to phoneme conversion. Maybe I will <a href="http://en.wikipedia.org/wiki/Hebrew_alphabet#Pronunciation">use this table</a>. There are several tables available at Wikipedia. I am not sure which one I should use. I have an idea: as far as I know, Yiddish and Hebrew <a href="http://spirit.blau.in/simon/2012/01/03/ralfs-yiddish-dictionary/">share the same alphabet</a>. This means I could try to use the Yiddish <a href="http://spirit.blau.in/simon/files/2012/01/improve-yiddish.xsl_.zip">improve-yiddish.xsl</a> style sheet:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ saxonb-xslt -s:hebrew.xml -xsl:'/home/ubuntu/Documents/2011-II/Yiddish/dictionaries/improve-yiddish.xsl' -o:hebrew-dictionary.xml</code></p></blockquote>
<p>The result is that most Hebrew letters have been converted into IPA. There is only one Hebrew letter that hasn&#8217;t been converted: [א] I will add this phone to the <code>.xsl</code> style sheet with the name <code>improve-hebrew.xsl</code>. Now I try it again:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Hebrew$ saxonb-xslt -s:hebrew.xml -xsl:'<a href='http://spirit.blau.in/simon/files/2012/01/improve-hebrew.xsl_.zip'>improve-hebrew.xsl</a>' -o:<a href="http://script.blau.in/hebrew-dictionary.xml.bz2" title="IPA phonetic dictionary, first draft">hebrew-dictionary.xml</a></code></p></blockquote>
<p>The result is not so good: Maybe I should adjust the grapheme to phoneme conversion rules for modern standard Israeli Hebrew. Or is this not necessary? I think for a first draft I can use the Yiddish transformation rules.</p>
<p><strong>B. <a href="http://script.blau.in/hebrew-dictionary.xml.bz2">Download</a> the dictionary. Import it into <a href="http://simon-listens.org/index.php?id=122&#038;L=1">simon</a> as shadow dictionary.</strong></p>
<p><a href="http://spirit.blau.in/simon/files/2012/01/hebrew-SAMPA.jpg"><img src="http://spirit.blau.in/simon/files/2012/01/hebrew-SAMPA-255x300.jpg" alt="" title="hebrew-SAMPA" width="255" height="300" class="alignleft size-medium wp-image-5723" /></a>Take a look at the result: The left column contains 43933 Hebrew words. The pronunciation column contains the corresponding SAMPA transcriptions. The category column is unemployed (or to be more exact: displays just <code>Unknown</code>) since the source PLS dictionary contains no <code>role</code> attributes.</p>
<div style="clear:both"></div>
<p>Now you know how I created the dictionary. And you know how the result looks like in simon. This dictionary uses more or less Yiddish pronunciation because I was too lazy to adjust it to modern standard Israeli Hebrew. It shouldn&#8217;t be a problem to adjust the style sheet <code>improve-hebrew.xsl</code> so that the phoneme results are better.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2012/01/10/ralfs-hebrew-dictionary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ralf&#8217;s Belarusian dictionary</title>
		<link>http://spirit.blau.in/simon/2012/01/09/ralfs-belarusian-dictionary/</link>
		<comments>http://spirit.blau.in/simon/2012/01/09/ralfs-belarusian-dictionary/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 20:27:52 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[dictionary]]></category>
		<category><![CDATA[Belarusian]]></category>
		<category><![CDATA[iconv]]></category>
		<category><![CDATA[PLS]]></category>
		<category><![CDATA[unmunch]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5699</guid>
		<description><![CDATA[This article explains how I create this PLS dictionary and how the imported result looks like. A. Creation of the Belarusian PLS dictionary: 1. Get spelling dictionary. I choose the official orthography. 2. License is LGPL (see hyph_be_BY.dic). I am allowed to &#8220;convert any LGPLed piece of software into a GPLed piece of software.&#8221; I [...]]]></description>
			<content:encoded><![CDATA[<p>This article explains how I create this PLS dictionary and how the imported result looks like.</p>
<p><strong>A. Creation of the Belarusian PLS dictionary:</strong></p>
<p>1. <a href="http://extensions.services.openoffice.org/en/project/dict-be-official">Get</a> spelling dictionary. I choose the official orthography.<br />
2. License is <a href="http://www.gnu.org/licenses/lgpl.html">LGPL</a> (see <code>hyph_be_BY.dic</code>). I am <a href="http://en.wikipedia.org/wiki/GNU_Lesser_General_Public_License#Differences_from_the_GPL">allowed</a> to <em>&#8220;convert any LGPLed piece of software into a GPLed piece of software.&#8221;</em> I <a href="http://spirit.blau.in/simon/2010/05/22/ralfs-northern-sotho-dictionary/">did this before</a>. And I will do it again. This means that I get a spelling dictionary that is licensed under the LGPL. And I will produce a pronunciation dictionary that is licensed under the GPLv3. By the way, all my dictionaries are <a href="http://spirit.blau.in/simon/import-pls-dictionary/" title="phonetic IPA dictionaries">licensed under the GPLv3</a>.<br />
3. Extract <a href="http://extensions.services.openoffice.org/en/download/2976">dict-be-official.oxt</a>.</p>
<p>4. The file <code>be-official.aff</code> is encoded in UTF-8. The file <code>be-official-dic</code> may be encoded in ISO-8859-1. At least this encoding is displayed by <a href="http://www.geany.org/">Geany</a>. I believe that <code>be-official-dic</code> is encoded in microsoft-cp1251. I had this encoding before (<a title="Macedonian PLS dictionary - Pronunciation Lexicon Specification" href="http://spirit.blau.in/simon/2010/04/24/ralfs-macedonian-dictionary/">Macedonian</a> and <a href="http://spirit.blau.in/simon/2010/04/17/creating-ralfs-bulgarian-dictionary/" title="phonetic dictionary">Bulgarian</a>).<br />
Now it is time to use the Ubuntu terminal:<br />
<code><a href="http://en.wikipedia.org/wiki/Cd_(command)">cd</a> /home/ubuntu/Documents/2011-II/Belarusian<br />
<a href="http://en.wikipedia.org/wiki/Iconv#Examples">iconv</a> -f cp1251 -t UTF-8 &lt;be-official.dic &gt;belarusian-utf8.dic</code><br />
The text file <code>belarusian-utf8.dic</code> looks fine.</p>
<p>5. Now I change the line <code>SET microsoft-cp1251</code> in the file <code>be-official.aff</code> into <code>SET UTF-8</code><br />
6. I don&#8217;t know whether the next step is necessary. I could convert the file <code>hyph_be_BY.dic</code> from cp1251 into UTF-8. At the moment, I skip this step.</p>
<p>7. Ubuntu terminal: <code>unmunch belarusian-utf8.dic be-official.aff &gt; belarusian-wordlist</code> I think that this step wasn&#8217;t necessary. It didn&#8217;t extract the word list. At the moment, I have a word list of 1.5 million words. This is way too much. I have to reduce the dictionary size. The target size is 400.000 words.</p>
<p>8. I have to reduce the dictionary size. I found a <a href="http://www.unix.com/shell-programming-scripting/24845-remove-every-third-line-file.html">tip</a>. Ubuntu terminal:</p>
<blockquote><p><code><a href="http://en.wikipedia.org/wiki/Sed#Usage">sed</a> -n 'p;N;N;N' belarusian-wordlist <a href="http://en.wikipedia.org/wiki/Redirection_%28computing%29#Redirecting_standard_input_and_standard_output">></a> belarusian-wordlist-reduced</code></p></blockquote>
<p>Yes, it worked. The word list contains now 391.000 words. This is a good basis for a PLS dictionary.</p>
<p>9. Add lexicon elements at the beginning and the end of belarusian-wordlist-reduced.<br />
10. Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Belarusian$ saxonb-xslt -s:belarusian-wordlist-reduced -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:belarusian.xml</code></p></blockquote>
<p>11. <a href="http://en.wikipedia.org/wiki/Belarusian_language">Language</a> code is be.<br />
12. I will use <a href="http://en.wikipedia.org/wiki/Belarusian_alphabet#Letters">this</a> table for grapheme to phoneme mapping.<br />
13. Creation of the phoneme elements:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Belarusian$ saxonb-xslt -s:belarusian.xml -xsl:'<a href='http://spirit.blau.in/simon/files/2012/01/improve-belarusian.xsl_.zip'>improve-belarusian.xsl</a>' -o:<a href="http://script.blau.in/belarusian-dictionary.xml.bz2" title="Belarusian IPA phonetic dictionary">belarusian-dictionary.xml</a></code></p></blockquote>
<p><strong>B. <a href="http://script.blau.in/belarusian-dictionary.xml.bz2" title="Belarusian - Pronunciation Lexicon Specification">Download</a> and import the dictionary.</strong> </p>
<p><a href="http://spirit.blau.in/simon/files/2012/01/belarusian.jpg"><img src="http://spirit.blau.in/simon/files/2012/01/belarusian-300x275.jpg" alt="" title="belarusian" width="300" height="275" class="alignleft size-medium wp-image-5709" /></a>Let&#8217;s take a look at the result. The left column contains 391669 Belarusian words. The pronunciation column contains the corresponding SAMPA transcriptions. All entries in the third column are marked as &#8220;Unknown&#8221;. This is because the Belarusian PLS dictionary doesn&#8217;t contain any <code>role</code> attribute.</p>
<div style="clear:both"></div>
<p>Now you know how I created the dictionary. And you got an impression how the result looks like when imported into simon.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2012/01/09/ralfs-belarusian-dictionary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ralf&#8217;s Asturian dictionary</title>
		<link>http://spirit.blau.in/simon/2012/01/05/ralfs-asturian-dictionary/</link>
		<comments>http://spirit.blau.in/simon/2012/01/05/ralfs-asturian-dictionary/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 20:30:43 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[dictionary]]></category>
		<category><![CDATA[Asturian]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[PLS]]></category>
		<category><![CDATA[saxonb-xslt]]></category>
		<category><![CDATA[unmunch]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5678</guid>
		<description><![CDATA[This article explains how I create the Asturian PLS dictionary, and some words about the import into simon. A. How I create the dictionary: 1. Get spelling dictionary. 2. Check license. It is GPLv3. 3. Extract asturianu.oxt. 4. Language code is ast. 5. Ubuntu terminal: ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ unmunch ast.dic ast.aff > asturian-wordlist The result is a [...]]]></description>
			<content:encoded><![CDATA[<p>This article explains how I create the Asturian PLS dictionary, and some words about the import into simon.</p>
<p>A. How I create the dictionary:<br />
1. <a href="http://extensions.services.openoffice.org/en/project/asturianu">Get</a> spelling dictionary.<br />
2. Check license. It is <a href="http://extensions.services.openoffice.org/en/project/license/3932">GPLv3</a>.<br />
3. Extract <a href="http://extensions.services.openoffice.org/en/download/5129">asturianu.oxt</a>.<br />
4. <a href="http://en.wikipedia.org/wiki/Asturian_language">Language</a> code is ast.<br />
5. Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ unmunch ast.dic ast.aff > asturian-wordlist</code></p></blockquote>
<p>The result is a file of 70MB with more than 5 million words. This word list is too big. I should reduce it. I had the <a href="http://spirit.blau.in/simon/2010/04/13/removing-words-from-latin-dictionary/">same problem</a> with my Latin dictionary. I had to reduce the size.</p>
<p>6. Add lexicon elements at the beginning/end of asturian-wordlist.</p>
<p>7. Generate .xml document with lexicon, lexeme and grapheme elements:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:asturian.xml</code></p></blockquote>
<p>I got an error message because the available space isn&#8217;t enough (&#8220;Java heap space&#8221;). I think that I should reduce the file size with <a href="http://en.wikipedia.org/wiki/Grep">grep</a>. Or I install VisualVM. I think I will work with grep:<br />
a. Remove lines that begin with l&#8217;: ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ grep -v ^l\&#8217; asturian-wordlist > asturian-wordlist-02<br />
b. Remove lines that begin with t&#8217;: grep -v ^t\&#8217; asturian-wordlist-02 > asturian-wordlist-03<br />
c. Remove lines that begin with s&#8217;: grep -v ^s\&#8217; asturian-wordlist-03 > asturian-wordlist-04<br />
d. Remove lines that begin with m&#8217;: grep -v ^m\&#8217; asturian-wordlist-04 > asturian-wordlist-05<br />
e. Remove lines that begin with n&#8217;: grep -v ^n\&#8217; asturian-wordlist-05 > asturian-wordlist-06<br />
f. Remove lines that begin with d&#8217;: grep -v ^d\&#8217; asturian-wordlist-06 > asturian-wordlist-07<br />
g. Remove lines that begin with qu&#8217;: grep -v ^qu\&#8217; asturian-wordlist-07 > asturian-wordlist-08<br />
h. Remove lines that begin with p&#8217;: grep -v ^p\&#8217; asturian-wordlist-08 > asturian-wordlist-09<br />
The dictionary will contain 1.1 million words. I think that that number is acceptable.</p>
<p>8. And now Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian-wordlist-09 -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:asturian.xml</code></p></blockquote>
<p>This command creates a PLS dictionary without phoneme elements. The phoneme elements will be added later.</p>
<p>9. I will use <a href="http://en.wikipedia.org/wiki/Asturian_language#Orthography">this</a> table for grapheme to phoneme conversion. Here is the command that creates the phoneme elements:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian.xml -xsl:'<a href='http://spirit.blau.in/simon/files/2012/01/improve-asturian.xsl_.zip'>improve-asturian.xsl</a>' -o:<a href="http://script.blau.in/asturian-dictionary.xml.bz2">asturian-dictionary.xml</a></code></p></blockquote>
<p>10. I tried to import the resulting dictionary into simon. Unfortunately, simon didn&#8217;t react any more after the import had been finished. I assume that the dictionary is way too big. I have to reduce its size, again.<br />
a. Remove lines that contain &#8216;l: grep -v \&#8217;l asturian-wordlist-09 > asturian-wordlist-10<br />
b. Continue to reduce the size of the wordlist: grep -v ylu astorian-wordlist-10 > astorian-wordlist-11<br />
c. This isn&#8217;t enough, I have to remove about 80.000 words: grep -v les asturian-wordlist-11 > asturian-wordlist-12<br />
d. Remove 136.000 words: grep -v mos asturian-wordlist-12 > asturian-wordlist-13<br />
e. Remove 67.000 words: grep -v los asturian-wordlist-13 > asturian-wordlist-14<br />
f. Remove 265.000 words:  grep -v es asturian-wordlist-14 > asturian-wordlist-15<br />
You see it is a lot of work to get a dictionary size that is suitable for simon. At the moment, the word list contains 539.000 words. Is this number OK, or should I continue to reduce the size? I think that I will try it again. Again, I will create an <code>.xml</code> file:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Asturian/dictionaries$ saxonb-xslt -s:asturian-wordlist-15 -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:asturian.xml</code></p></blockquote>
<p>And now I repeat step 9. The file <code>asturian-dictionary.xml</code> has a size of 45 MB. I hope that this size is OK.</p>
<p>B. <a href="http://script.blau.in/asturian-dictionary.xml.bz2">Download the dictionary</a>. Import it into simon.</p>
<p><a href="http://spirit.blau.in/simon/files/2012/01/asturian.jpg"><img src="http://spirit.blau.in/simon/files/2012/01/asturian-238x300.jpg" alt="" title="asturian" width="238" height="300" class="alignleft size-medium wp-image-5693" /></a>Take a look at the result. In the left column, you can see the Asturian words. This dictionary contains 539928 words. The right column contains the corresponding SAMPA transcriptions.
<div style="clear:both"></div>
<p>You could see that it was a lot of work to reduce the size of the dictionary. At least, now it has a size that isn&#8217;t too big for simon.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2012/01/05/ralfs-asturian-dictionary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ralf&#8217;s Yiddish dictionary</title>
		<link>http://spirit.blau.in/simon/2012/01/03/ralfs-yiddish-dictionary/</link>
		<comments>http://spirit.blau.in/simon/2012/01/03/ralfs-yiddish-dictionary/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 17:05:18 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[dictionary]]></category>
		<category><![CDATA[PLS]]></category>
		<category><![CDATA[saxonb-xslt]]></category>
		<category><![CDATA[unmunch]]></category>
		<category><![CDATA[yiddish]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5663</guid>
		<description><![CDATA[This article explains some details about the creation of the dictionary, and how the result looks like in simon. A. How I create Ralf's Yiddish dictionary: 1. Get spelling dictionary. 2. License is GPLv3. 3. Extract jidysz.net.ooo.spellchecker.oxt. 4. Ubuntu terminal: cd /home/ubuntu/Documents/2011-II/Yiddish/dictionaries sudo apt-get install hunspell-tools unmunch yi.dic yi.aff &#62; yiddish-wordlist 5. Add &#60;lexicon&#62; at [...]]]></description>
			<content:encoded><![CDATA[<p>This article explains some details about the creation of the dictionary, and how the result looks like in simon.</p>
<p>A. How I create <code>Ralf's Yiddish dictionary</code>:</p>
<p>1. <a href="http://extensions.services.openoffice.org/en/project/jidysz-net-ooo-spellchecker">Get</a> spelling dictionary.<br />
2. License is <code>GPLv3</code>.<br />
3. Extract <a href="http://extensions.services.openoffice.org/en/download/4324"><code>jidysz.net.ooo.spellchecker.oxt</code></a>.<br />
4. Ubuntu terminal:<br />
<code>cd /home/ubuntu/Documents/2011-II/Yiddish/dictionaries<br />
sudo apt-get install hunspell-tools<br />
unmunch yi.dic yi.aff &gt; yiddish-wordlist</code><br />
5. Add  <code>&lt;lexicon&gt;</code> at the beginning of yiddish-wordlist. Add <code>&lt;/lexicon&gt;</code> at the end of this file.<br />
6. Generate <code>.xml</code> document with lexicon, lexeme and grapheme elements:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Yiddish/dictionaries$ saxonb-xslt -s:yiddish-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-xml-file.xsl' -o:yiddish.xml</code></p></blockquote>
<p>7. ISO 639-1 <a href="http://en.wikipedia.org/wiki/Yiddish_language">language code</a> is yi.<br />
8. I think I will use <a href="http://en.wikipedia.org/wiki/Yiddish_orthography#The_Yiddish_alphabet">this table</a> as source for the grapheme to phoneme mapping.<br />
9. Ubuntu terminal:</p>
<blockquote><p><code>ubuntu@ubuntu:~/Documents/2011-II/Yiddish/dictionaries$ saxonb-xslt -s:yiddish.xml -xsl:'<a href='http://spirit.blau.in/simon/files/2012/01/improve-yiddish.xsl_.zip'>improve-yiddish.xsl</a>' -o:yiddish-dictionary.xml</code></p></blockquote>
<p>B. <a href="http://script.blau.in/yiddish-dictionary.xml.bz2">Download the dictionary</a>, and <a href="http://spirit.blau.in/simon/2010/06/17/tutorial-import-german-dictionary/">import</a> it into simon.</p>
<p><a href="http://spirit.blau.in/simon/files/2012/01/yiddish.jpg"><img src="http://spirit.blau.in/simon/files/2012/01/yiddish-243x300.jpg" alt="" title="yiddish" width="243" height="300" class="alignleft size-medium wp-image-5670" /></a>Take a look at the result. The left column contains the Yiddish words. This dictionary contains 99980 words. The right column contains the corresponding SAMPA transcription.<br />
<a href="http://en.wikipedia.org/wiki/Yiddish_language">Yiddish</a> is written in the Hebrew alphabet. The <a href="http://en.wikipedia.org/wiki/Hebrew_alphabet">Hebrew alphabet</a> is written from right to left. Obviously, the corresponding SAMPA transcriptions are written from left to right. This means that the phoneme order should be fine.</p>
<div style="clear:both"></div>
<p>There are a lot of other PLS dictionaries available. <a href="http://spirit.blau.in/simon/import-pls-dictionary/">Find the PLS dictionary</a> that suits your language.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2012/01/03/ralfs-yiddish-dictionary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schott’s German dictionary 0.2.8</title>
		<link>http://spirit.blau.in/simon/2011/11/01/schott%e2%80%99s-german-dictionary-0-2-8/</link>
		<comments>http://spirit.blau.in/simon/2011/11/01/schott%e2%80%99s-german-dictionary-0-2-8/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 17:05:20 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[dictionary]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5569</guid>
		<description><![CDATA[Here is how I create Schott’s German dictionary 0.2.8 (with the style sheet improve-german.xsl): 1. Replace 152 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;planung&#8217;)&#8221;&#62;&#60;xsl:value-of select=&#8221;replace($sierra, &#8216;planʊŋ&#8217;,'plaːnʊŋ&#8217;)&#8221;/&#62;&#60;/xsl:when&#62; 2. Replace 178 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;fußball&#8217;)&#8221;&#62;&#60;xsl:value-of select=&#8221;replace($sierra, &#8216;fʊsbal&#8217;,'fuːsbal&#8217;)&#8221;/&#62;&#60;/xsl:when&#62; A lot of other small changes have been made. Please, import Schott’s German dictionary (author: Kai Schott) into simon.]]></description>
			<content:encoded><![CDATA[<p>Here is how I create <a href="http://script.blau.in/german-dictionary.xml.bz2">Schott’s German dictionary</a> 0.2.8 (with the style sheet <code>improve-german.xsl</code>):</p>
<p>1. Replace 152 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;planung&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;planʊŋ&#8217;,'plaːnʊŋ&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>2. Replace 178 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;fußball&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;fʊsbal&#8217;,'fuːsbal&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>A lot of other small changes have been made. Please, <a href="http://spirit.blau.in/simon/2010/06/17/tutorial-import-german-dictionary/">import Schott’s German dictionary</a> (author: Kai Schott) into simon.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2011/11/01/schott%e2%80%99s-german-dictionary-0-2-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schott&#8217;s German dictionary 0.2.7</title>
		<link>http://spirit.blau.in/simon/2011/05/25/schotts-german-dictionary-0-2-7/</link>
		<comments>http://spirit.blau.in/simon/2011/05/25/schotts-german-dictionary-0-2-7/#comments</comments>
		<pubDate>Wed, 25 May 2011 17:11:38 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[dictionary]]></category>
		<category><![CDATA[de]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5430</guid>
		<description><![CDATA[With the style sheet improve-german.xsl, I create Schott's German dictionary version 0.2.7: 1. Replace 174 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;strom&#8217;)&#8221;&#62;&#60;xsl:value-of select=&#8221;replace($sierra, &#8216;ʃtʀɔm&#8217;,'ʃtʀoːm&#8217;)&#8221;/&#62;&#60;/xsl:when&#62; 2. Replace 36 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;knüll&#8217;)&#8221;&#62;&#60;xsl:value-of select=&#8221;replace($sierra, &#8216;knyl&#8217;,'knʏl&#8217;)&#8221;/&#62;&#60;/xsl:when&#62; 3. Replace 39 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;knüppel&#8217;)&#8221;&#62;&#60;xsl:value-of select=&#8221;replace($sierra, &#8216;knypəl&#8217;,'knʏpəl&#8217;)&#8221;/&#62;&#60;/xsl:when&#62; 4. Replace 55 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;pflück&#8217;)&#8221;&#62;&#60;xsl:value-of select=&#8221;replace($sierra, &#8216;pflyk&#8217;,'pflʏk&#8217;)&#8221;/&#62;&#60;/xsl:when&#62; 5. Replace 23 matches: &#60;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;kollision&#8217;)&#8221;&#62;&#60;xsl:value-of [...]]]></description>
			<content:encoded><![CDATA[<p>With the style sheet <code>improve-german.xsl</code>, I create <code><a href="http://script.blau.in/german-dictionary.xml.bz2">Schott's German dictionary</a></code> version 0.2.7: <span id="more-5430"></span></p>
<p>1. Replace 174 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;strom&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;ʃtʀɔm&#8217;,'ʃtʀoːm&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>2. Replace 36 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;knüll&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;knyl&#8217;,'knʏl&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>3. Replace 39 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;knüppel&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;knypəl&#8217;,'knʏpəl&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>4. Replace 55 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;pflück&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;pflyk&#8217;,'pflʏk&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;
</p></blockquote>
<p>5. Replace 23 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;kollision&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;kɔliːzɪoːn&#8217;,'kɔlizi̯oːn&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>6. Replace 49 matches:</p>
<blockquote><p>
&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;kolonial&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;koːloːniːal&#8217;,'koloni̯aːl&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>7. Replace 220 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;stürz&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;ʃtyʀt͡s&#8217;,'ʃtʏʀt͡s&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
<p>8. Replace 93 matches:</p>
<blockquote><p>&lt;xsl:when test=&#8221;contains(lower-case(../grapheme), &#8216;hotel&#8217;)&#8221;&gt;&lt;xsl:value-of select=&#8221;replace($sierra, &#8216;hoːtəl&#8217;,'hotɛl&#8217;)&#8221;/&gt;&lt;/xsl:when&gt;</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2011/05/25/schotts-german-dictionary-0-2-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mapping between IPA and SAMPA</title>
		<link>http://spirit.blau.in/simon/2011/03/10/mapping-between-ipa-and-sampa/</link>
		<comments>http://spirit.blau.in/simon/2011/03/10/mapping-between-ipa-and-sampa/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 13:55:58 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[dictionary]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5433</guid>
		<description><![CDATA[I agree with this statement: The IPA phonemes are completely language independent. Therefore the mapping between IPA SAMPA is fixed. The mapping should, of course, be extended to cover all available IPA phonemes. However, as the conversion is static, I don&#8217;t think the rules need to be dynamic. If you have any IPA phonemes that [...]]]></description>
			<content:encoded><![CDATA[<p>I agree with this <a href="https://sourceforge.net/tracker/?func=detail&#038;atid=935106&#038;aid=3064781&#038;group_id=190872">statement</a>:</p>
<blockquote><p>The IPA phonemes are completely language independent. Therefore the mapping<br />
between IPA <-> SAMPA is fixed. The mapping should, of course, be extended<br />
to cover all available IPA phonemes. However, as the conversion is static,<br />
I don&#8217;t think the rules need to be dynamic.</p>
<p>If you have any IPA phonemes that are not correctly converted to SAMPA<br />
during the import, please report those as bugs!</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2011/03/10/mapping-between-ipa-and-sampa/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schott&#8217;s German dictionary 0.2.6</title>
		<link>http://spirit.blau.in/simon/2011/03/05/schotts-german-dictionary-0-2-6/</link>
		<comments>http://spirit.blau.in/simon/2011/03/05/schotts-german-dictionary-0-2-6/#comments</comments>
		<pubDate>Sat, 05 Mar 2011 13:49:40 +0000</pubDate>
		<dc:creator>producer</dc:creator>
				<category><![CDATA[dictionary]]></category>
		<category><![CDATA[de]]></category>

		<guid isPermaLink="false">http://spirit.blau.in/simon/?p=5332</guid>
		<description><![CDATA[Here is how I create Schott's German dictionary version 0.2.6. The style sheet improve-german.xsl contains the following transformation rules: 1. Replace 209 matches: &#60;xsl:when test="contains(lower-case(../grapheme), 'position')"&#62;&#60;xsl:value-of select="replace($sierra, 'poːziːt͡sɪoːn','pozit͡si̯oːn')"/&#62;&#60;/xsl:when&#62; 2. Replace 983 matches: &#60;xsl:when test="contains(lower-case(../grapheme), 'tion')"&#62;&#60;xsl:value-of select="replace($sierra, 't͡sɪoːn','t͡si̯oːn')"/&#62;&#60;/xsl:when&#62; 3. Replace 107 matches: &#60;xsl:when test="starts-with(lower-case(../grapheme), 'dahin')"&#62;&#60;xsl:value-of select="replace($sierra, 'daːɪn','daːhɪn')"/&#62;&#60;/xsl:when&#62; 4. Replace 50 matches: &#60;xsl:when test="contains(lower-case(../grapheme), 'fusion')"&#62;&#60;xsl:value-of select="replace($sierra, 'fuːzɪoːn','fuzi̯oːn')"/&#62;&#60;/xsl:when&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>Here is how I create <a href="http://script.blau.in/german-dictionary.xml.bz2"><code>Schott's German dictionary</code></a> version 0.2.6. The style sheet <a href='http://spirit.blau.in/simon/files/2011/03/improve-german.xsl_.zip'><code>improve-german.xsl</code></a> contains the following transformation rules:</p>
<p>1. Replace 209 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'position')"&gt;&lt;xsl:value-of select="replace($sierra, 'poːziːt͡sɪoːn','pozit͡si̯oːn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>2. Replace 983 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'tion')"&gt;&lt;xsl:value-of select="replace($sierra, 't͡sɪoːn','t͡si̯oːn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>3. Replace 107 matches:<span id="more-5332"></span></p>
<blockquote><p><code>&lt;xsl:when test="starts-with(lower-case(../grapheme), 'dahin')"&gt;&lt;xsl:value-of select="replace($sierra, 'daːɪn','daːhɪn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>4. Replace 50 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'fusion')"&gt;&lt;xsl:value-of select="replace($sierra, 'fuːzɪoːn','fuzi̯oːn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>5. Replace 230 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'würdig')"&gt;&lt;xsl:value-of select="replace($sierra, 'vyʀdɪg','vʏʀdɪg')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>6. Replace 90 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'proportion')"&gt;&lt;xsl:value-of select="replace($sierra, 'pʀoːpɔʀtsɪoːn','pʀopɔʀt͡si̯oːn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>7. Replace 40 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'würfe')"&gt;&lt;xsl:value-of select="replace($sierra, 'vyʀfə','vʏʀfə')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>8. Replace 65 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'periode')"&gt;&lt;xsl:value-of select="replace($sierra, 'peːʀɪoːd','pɛʀi̯oːd')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>9. Replace 1768 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'tion')"&gt;&lt;xsl:value-of select="replace($sierra, 'tsɪoːn','t͡si̯oːn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>10. Replace 131 matches:</p>
<blockquote><p><code>&lt;xsl:when test="ends-with(lower-case(../grapheme), 'linie') and<br />
ends-with($sierra, 'liːniː')"&gt;&lt;xsl:value-of select="replace($sierra, 'liːniː','liːni̯ə')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>11. Replace 90 matches:</p>
<blockquote><p><code>&lt;xsl:when test="ends-with(lower-case(../grapheme), 'linien') and<br />
ends-with($sierra, 'liːnɪən')"&gt;&lt;xsl:value-of select="replace($sierra, 'liːnɪən','liːni̯ən')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>12. Replace 94 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'flücht')"&gt;&lt;xsl:value-of select="replace($sierra, 'flyçt','flʏçt')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>13. Replace 54 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'flüster')"&gt;&lt;xsl:value-of select="replace($sierra, 'flystəʀ','flʏstəʀ')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>14. Replace 79 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'pension') and<br />
contains($sierra, 'pɛnzɪoːn')"&gt;&lt;xsl:value-of select="replace($sierra, 'pɛnzɪoːn','pɛnzi̯oːn')"/&gt;&lt;/xsl:when&gt;<br />
&lt;xsl:when test="contains(lower-case(../grapheme), 'pension') and<br />
contains($sierra, 'pɛnzjoːn')"&gt;&lt;xsl:value-of select="replace($sierra, 'pɛnzjoːn','pɛnzi̯oːn')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>15. Replace 66 matches:</p>
<blockquote><p><code>&lt;xsl:when test="starts-with(lower-case(../grapheme), 'ferien')"&gt;&lt;xsl:value-of select="replace($sierra, 'feːʀɪən','feːʀi̯ən')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>16. Replace 131 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'höchst')"&gt;&lt;xsl:value-of select="replace($sierra, 'hœçst','høːçst')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>17. Replace 147 matches:</p>
<blockquote><p><code>&lt;xsl:when test="contains(lower-case(../grapheme), 'gehalt')"&gt;&lt;xsl:value-of select="replace($sierra, 'geːalt','gəhalt')"/&gt;&lt;/xsl:when&gt;</code></p></blockquote>
<p>You can see that a lot of improvements have been made. Please <a href="http://spirit.blau.in/simon/import-pls-dictionary/">import the dictionary</a> into simon.</p>
]]></content:encoded>
			<wfw:commentRss>http://spirit.blau.in/simon/2011/03/05/schotts-german-dictionary-0-2-6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

