Date: Sat, 27 Jun 2015 09:24:20 -0700 (PDT)
From: sorpa'as plat <sorpaas@gmail.com>
To: lojban@googlegroups.com
Cc: timothy.lawrence@connect.qut.edu.au
Message-Id: <b8e6bb03-604b-48be-8924-29411d8f28fb@googlegroups.com>
In-Reply-To: <BL2PR01MB3704E814FB2B677E0871E7EEBAC0@BL2PR01MB370.prod.exchangelabs.com>
References: <cc8533d8-082e-41e4-90b7-2987d1cdbe85@googlegroups.com>
 <BL2PR01MB3704E814FB2B677E0871E7EEBAC0@BL2PR01MB370.prod.exchangelabs.com>
Subject: Re: [lojban] The Prototype of a Lojban Speech Recognition Tool
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_2937_1911882265.1435422260295"
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
X-Spam_score: -3.1
X-Spam_score_int: -30
X-Spam_bar: ---

------=_Part_2937_1911882265.1435422260295
Content-Type: multipart/alternative; 
	boundary="----=_Part_2938_207417794.1435422260295"

------=_Part_2938_207417794.1435422260295
Content-Type: text/plain; charset=UTF-8

Hi Timothy,

Thanks for the advice, but I think we may just stick on Sphinx for now. 
Sphinx uses hidden Markov acoustic models (HMMs) while Kaldi uses deep 
neural networks (DNNs). It's harder to train DNNs than to make use of HMMs, 
and we simply lack that manpower to do that :)

Wei
mu'o mi'e la sorpa'as

On Saturday, June 27, 2015 at 2:21:03 AM UTC-4, Timothy Lawrence wrote:
>
>  I really like this initiative :)
>  
>
>  A friend of mine is working on speech recognition (for other purposes) 
> and tried Sphinx but ended up changing to 
> http://kaldi.sourceforge.net/about.html, I believe because he found it to 
> be better at recognising speech and/or it was faster. I thought I'd mention 
> it in case you're not fully set on Sphinx.
>  
>  ------------------------------
> *From:* loj...@googlegroups.com <javascript:> <loj...@googlegroups.com 
> <javascript:>> on behalf of sorpa'as plat <sor...@gmail.com <javascript:>>
> *Sent:* Saturday, 27 June 2015 1:29 PM
> *To:* loj...@googlegroups.com <javascript:>
> *Subject:* [lojban] The Prototype of a Lojban Speech Recognition Tool 
>  
>  Hi all,
>
> I'm trying to build a Lojban speech recognition called tersku. Instead of 
> building an acoustic model by hand (which may need many manpower and takes 
> a long time), the attempt is to take the English acoustic model (which is 
> pretty mature) and adapt it for Lojban sounds.
>
> A running prototype can be found at https://git.null.tl/tersku.git (use *git://git.null.tl/tersku.git 
> <http://git.null.tl/tersku.git>* to clone). The prototype uses a 
> unmodified version of CMU's generic English acoustic model, with only 
> necessary dictionary and grammars to parse the text "le tanxe be le birka 
> cu cpana le tanxe be le botpi". To use it, recording a version of the text 
> "le tanxe be le birka cu cpana le tanxe be le botpi", convert the recording 
> to wav format, and replace the /resources/org/lojban/tersku/recording.wav 
> file with it. The program will output the best "hypothesis" for the text.
>
> The program does not work really well. That means there's lots of work and 
> I would appreciate your help. Below are some details of things to be done.
>
> *About the Program*
> tersku uses CMU's Sphinx speech recognition engine. You can find Sphinx's 
> tutorials and documentations at http://cmusphinx.sourceforge.net.
>
> *Adapt the Acoustic Model*
> The adaptation requires some 16KHZ single-channel wav recordings. Help are 
> appreciated if someone can create a Lojban phrase recording collection. 
> Note that a phrase recording collection will benefit the whole Lojban 
> community but not just the speech recognition program :)
>
> *Finish the Dictionary*
> The dictionary in the prototype locates at 
> resources/org/lojban/tersku/jbo-1.dict. Because we are trying to adapt the 
> English acoustic model, all the phones are represented in Arpabet (*https://en.wikipedia.org/wiki/Arpabet 
> <https://en.wikipedia.org/wiki/Arpabet>*). We will need to a) confirm 
> which arpabet symbol represents which Lojban sound, and b) write a program 
> that generates all the words in "[lojban word] [arpabet symbols]". This is 
> probably dependent of the adaptation of the acoustic model.
>
> *Finish the Grammar*
> The grammar needs to be written in JSGF format (
> http://cmusphinx.sourceforge.net/wiki/tutoriallm). This haven't been 
> started yet (which needs help!).
>
> *Correct Me!*
> There must be mistakes and errors both in the codes and in the recognition 
> details (I'm new to speech recognition!).
>
> Feel free to reach me at this email address or by opening an task at 
> https://phabricator.null.tl. I'm really looking forward to a Lojban 
> speech recognition tool, because it should be one of the features of Lojban 
> :)
>
> Wei
> mu'o mi'e la sorpa'as
>  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "lojban" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to lojban+un...@googlegroups.com <javascript:>.
> To post to this group, send email to loj...@googlegroups.com <javascript:>
> .
> Visit this group at http://groups.google.com/group/lojban.
> For more options, visit https://groups.google.com/d/optout.
>   

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

------=_Part_2938_207417794.1435422260295
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Timothy,<br><br>Thanks for the advice, but I think we m=
ay just stick on Sphinx for now. Sphinx uses hidden Markov acoustic models =
(HMMs) while Kaldi uses deep neural networks (DNNs). It's harder to train D=
NNs than to make use of HMMs, and we simply lack that manpower to do that :=
)<br><br>Wei<br>mu'o mi'e la sorpa'as<br><br>On Saturday, June 27, 2015 at =
2:21:03 AM UTC-4, Timothy Lawrence wrote:<blockquote class=3D"gmail_quote" =
style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-l=
eft: 1ex;">


<div dir=3D"ltr">
<div style=3D"font-size:10pt;color:#000000;background-color:#ffffff;font-fa=
mily:Calibri,Arial,Helvetica,sans-serif">
<p>I really like this initiative :)<br>
</p>
<p><br>
</p>
<p>A friend of mine is working on speech recognition (for other purposes) a=
nd tried Sphinx but ended up changing to
<a title=3D"Ctrl+Click or tap to follow the link" href=3D"http://kaldi.sour=
ceforge.net/about.html" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"t=
his.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fkaldi.sourceforge.ne=
t%2Fabout.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHnBZjJTMPH0OFkLy3e2kB4fHz=
dug';return true;" onclick=3D"this.href=3D'http://www.google.com/url?q\75ht=
tp%3A%2F%2Fkaldi.sourceforge.net%2Fabout.html\46sa\75D\46sntz\0751\46usg\75=
AFQjCNHnBZjJTMPH0OFkLy3e2kB4fHzdug';return true;">
http://kaldi.sourceforge.net/<wbr>about.html</a>, I believe because he foun=
d it to be better at recognising speech and/or it was faster. I thought I'd=
 mention it in case you're not fully set on Sphinx.<br>
</p>
<br>
<div style=3D"color:rgb(0,0,0)">
<hr style=3D"display:inline-block;width:98%">
<div dir=3D"ltr"><font style=3D"font-size:11pt" face=3D"Calibri, sans-serif=
" color=3D"#000000"><b>From:</b> <a href=3D"javascript:" target=3D"_blank" =
gdf-obfuscated-mailto=3D"qOof2s3iA_QJ" rel=3D"nofollow" onmousedown=3D"this=
.href=3D'javascript:';return true;" onclick=3D"this.href=3D'javascript:';re=
turn true;">loj...@googlegroups.com</a> &lt;<a href=3D"javascript:" target=
=3D"_blank" gdf-obfuscated-mailto=3D"qOof2s3iA_QJ" rel=3D"nofollow" onmouse=
down=3D"this.href=3D'javascript:';return true;" onclick=3D"this.href=3D'jav=
ascript:';return true;">loj...@googlegroups.com</a>&gt; on behalf of sorpa'=
as plat &lt;<a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=
=3D"qOof2s3iA_QJ" rel=3D"nofollow" onmousedown=3D"this.href=3D'javascript:'=
;return true;" onclick=3D"this.href=3D'javascript:';return true;">sor...@gm=
ail.com</a>&gt;<br>
<b>Sent:</b> Saturday, 27 June 2015 1:29 PM<br>
<b>To:</b> <a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=
=3D"qOof2s3iA_QJ" rel=3D"nofollow" onmousedown=3D"this.href=3D'javascript:'=
;return true;" onclick=3D"this.href=3D'javascript:';return true;">loj...@go=
oglegroups.com</a><br>
<b>Subject:</b> [lojban] The Prototype of a Lojban Speech Recognition Tool<=
/font>
<div>&nbsp;</div>
</div>
<div>
<div dir=3D"ltr">Hi all,<br>
<br>
I'm trying to build a Lojban speech recognition called tersku. Instead of b=
uilding an acoustic model by hand (which may need many manpower and takes a=
 long time), the attempt is to take the English acoustic model (which is pr=
etty mature) and adapt it for Lojban
 sounds.<br>
<br>
A running prototype can be found at <a href=3D"https://git.null.tl/tersku.g=
it" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D'https://=
www.google.com/url?q\75https%3A%2F%2Fgit.null.tl%2Ftersku.git\46sa\75D\46sn=
tz\0751\46usg\75AFQjCNEGJjyn5v8I__KuqeeYTtQi8df6xw';return true;" onclick=
=3D"this.href=3D'https://www.google.com/url?q\75https%3A%2F%2Fgit.null.tl%2=
Ftersku.git\46sa\75D\46sntz\0751\46usg\75AFQjCNEGJjyn5v8I__KuqeeYTtQi8df6xw=
';return true;">https://git.null.tl/tersku.git</a> (use
<b>git://<a href=3D"http://git.null.tl/tersku.git" target=3D"_blank" rel=3D=
"nofollow" onmousedown=3D"this.href=3D'http://www.google.com/url?q\75http%3=
A%2F%2Fgit.null.tl%2Ftersku.git\46sa\75D\46sntz\0751\46usg\75AFQjCNHyF-a8N0=
enbaTcsPwxYqGZesIHTg';return true;" onclick=3D"this.href=3D'http://www.goog=
le.com/url?q\75http%3A%2F%2Fgit.null.tl%2Ftersku.git\46sa\75D\46sntz\0751\4=
6usg\75AFQjCNHyF-a8N0enbaTcsPwxYqGZesIHTg';return true;">git.null.tl/tersku=
.git</a></b> to clone). The prototype uses a unmodified version of CMU's ge=
neric English acoustic model, with only necessary dictionary and grammars t=
o parse the text "le tanxe be le birka cu cpana le tanxe be le botpi". To u=
se it, recording
 a version of the text "le tanxe be le birka cu cpana le tanxe be le botpi"=
, convert the recording to wav format, and replace the /resources/org/lojba=
n/tersku/<wbr>recording.wav file with it. The program will output the best =
"hypothesis" for the text.<br>
<br>
The program does not work really well. That means there's lots of work and =
I would appreciate your help. Below are some details of things to be done.<=
br>
<br>
<b>About the Program</b><br>
tersku uses CMU's Sphinx speech recognition engine. You can find Sphinx's t=
utorials and documentations at
<a href=3D"http://cmusphinx.sourceforge.net" target=3D"_blank" rel=3D"nofol=
low" onmousedown=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2=
Fcmusphinx.sourceforge.net\46sa\75D\46sntz\0751\46usg\75AFQjCNFEwoJJxJ3bhDd=
qJbYUtuaWLfaYsA';return true;" onclick=3D"this.href=3D'http://www.google.co=
m/url?q\75http%3A%2F%2Fcmusphinx.sourceforge.net\46sa\75D\46sntz\0751\46usg=
\75AFQjCNFEwoJJxJ3bhDdqJbYUtuaWLfaYsA';return true;">http://cmusphinx.sourc=
eforge.<wbr>net</a>.<br>
<br>
<b>Adapt the Acoustic Model</b><br>
The adaptation requires some 16KHZ single-channel wav recordings. Help are =
appreciated if someone can create a Lojban phrase recording collection. Not=
e that a phrase recording collection will benefit the whole Lojban communit=
y but not just the speech recognition
 program :)<br>
<br>
<b>Finish the Dictionary</b><br>
The dictionary in the prototype locates at resources/org/lojban/tersku/<wbr=
>jbo-1.dict. Because we are trying to adapt the English acoustic model, all=
 the phones are represented in Arpabet (<b><a href=3D"https://en.wikipedia.=
org/wiki/Arpabet" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.hr=
ef=3D'https://www.google.com/url?q\75https%3A%2F%2Fen.wikipedia.org%2Fwiki%=
2FArpabet\46sa\75D\46sntz\0751\46usg\75AFQjCNH2cxyWS0F4PEuwot3roS5YjxM6bw';=
return true;" onclick=3D"this.href=3D'https://www.google.com/url?q\75https%=
3A%2F%2Fen.wikipedia.org%2Fwiki%2FArpabet\46sa\75D\46sntz\0751\46usg\75AFQj=
CNH2cxyWS0F4PEuwot3roS5YjxM6bw';return true;">https://en.wikipedia.org/<wbr=
>wiki/Arpabet</a></b>). We will need to a) confirm
 which arpabet symbol represents which Lojban sound, and b) write a program=
 that generates all the words in "[lojban word] [arpabet symbols]". This is=
 probably dependent of the adaptation of the acoustic model.<br>
<br>
<b>Finish the Grammar</b><br>
The grammar needs to be written in JSGF format (<a href=3D"http://cmusphinx=
.sourceforge.net/wiki/tutoriallm" target=3D"_blank" rel=3D"nofollow" onmous=
edown=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fcmusphinx.=
sourceforge.net%2Fwiki%2Ftutoriallm\46sa\75D\46sntz\0751\46usg\75AFQjCNGTs1=
v5V2Ed7XlAKW4zrlIHzu2A6w';return true;" onclick=3D"this.href=3D'http://www.=
google.com/url?q\75http%3A%2F%2Fcmusphinx.sourceforge.net%2Fwiki%2Ftutorial=
lm\46sa\75D\46sntz\0751\46usg\75AFQjCNGTs1v5V2Ed7XlAKW4zrlIHzu2A6w';return =
true;">http://cmusphinx.sourceforge.<wbr>net/wiki/tutoriallm</a>). This hav=
en't been started yet (which needs help!).<br>
<br>
<b>Correct Me!</b><br>
There must be mistakes and errors both in the codes and in the recognition =
details (I'm new to speech recognition!).<br>
<br>
Feel free to reach me at this email address or by opening an task at <a hre=
f=3D"https://phabricator.null.tl" target=3D"_blank" rel=3D"nofollow" onmous=
edown=3D"this.href=3D'https://www.google.com/url?q\75https%3A%2F%2Fphabrica=
tor.null.tl\46sa\75D\46sntz\0751\46usg\75AFQjCNHV6tM5UuHTadKRmgYNgYpRwEC5DA=
';return true;" onclick=3D"this.href=3D'https://www.google.com/url?q\75http=
s%3A%2F%2Fphabricator.null.tl\46sa\75D\46sntz\0751\46usg\75AFQjCNHV6tM5UuHT=
adKRmgYNgYpRwEC5DA';return true;">
https://phabricator.null.tl</a>. I'm really looking forward to a Lojban spe=
ech recognition tool, because it should be one of the features of Lojban :)=
<br>
<br>
Wei<br>
mu'o mi'e la sorpa'as<br>
</div>
<p></p>
-- <br>
You received this message because you are subscribed to the Google Groups "=
lojban" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to
<a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=3D"qOof2s3i=
A_QJ" rel=3D"nofollow" onmousedown=3D"this.href=3D'javascript:';return true=
;" onclick=3D"this.href=3D'javascript:';return true;">lojban+un...@<wbr>goo=
glegroups.com</a>.<br>
To post to this group, send email to <a href=3D"javascript:" target=3D"_bla=
nk" gdf-obfuscated-mailto=3D"qOof2s3iA_QJ" rel=3D"nofollow" onmousedown=3D"=
this.href=3D'javascript:';return true;" onclick=3D"this.href=3D'javascript:=
';return true;">loj...@googlegroups.com</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/group/lojban" targe=
t=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D'http://groups.goo=
gle.com/group/lojban';return true;" onclick=3D"this.href=3D'http://groups.g=
oogle.com/group/lojban';return true;">http://groups.google.com/<wbr>group/l=
ojban</a>.<br>
For more options, visit <a href=3D"https://groups.google.com/d/optout" targ=
et=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D'https://groups.g=
oogle.com/d/optout';return true;" onclick=3D"this.href=3D'https://groups.go=
ogle.com/d/optout';return true;">https://groups.google.com/d/<wbr>optout</a=
>.<br>
</div>
</div>
</div>
</div>

</blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:lojban+unsubscribe@googlegroups.com">lojban+unsub=
scribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/group/lojban">http:=
//groups.google.com/group/lojban</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

------=_Part_2938_207417794.1435422260295--
------=_Part_2937_1911882265.1435422260295--