Date: Mon, 29 Jun 2015 11:11:53 -0700 (PDT)
From: la durka <durka42@gmail.com>
To: lojban@googlegroups.com
Message-Id: <c014c099-0b1c-47e4-b7cf-462e666db165@googlegroups.com>
In-Reply-To: <cc8533d8-082e-41e4-90b7-2987d1cdbe85@googlegroups.com>
References: <cc8533d8-082e-41e4-90b7-2987d1cdbe85@googlegroups.com>
Subject: [lojban] Re: The Prototype of a Lojban Speech Recognition Tool
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_89_1792634663.1435601513401"
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
X-Spam_score: -3.1
X-Spam_score_int: -30
X-Spam_bar: ---

------=_Part_89_1792634663.1435601513401
Content-Type: multipart/alternative; 
	boundary="----=_Part_90_1762697278.1435601513401"

------=_Part_90_1762697278.1435601513401
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Awesome initiative!

I had some trouble getting the program to run. Here are my full=20
instructions, let me know if there is an easier way:

$ git clone git://git.null.tl/tersku.git
$ cd tersku
$ mvn package
$ mvn install
$ mv /path/to/my/recording.wav resources/org/lojban/tersku/recording.wav
$ java -cp=20
~/.m2/repository/edu/cmu/sphinx/sphinx4-core/1.0-SNAPSHOT/sphinx4-core-1.0-=
SNAPSHOT.jar:target/tersku-1.0-SNAPSHOT.jar:resources=20
com.lojban.tersku.Main

Also, when I run the program on your WAV file, I see it outputting a bunch=
=20
of hypotheses as one- or two-word phrases. How can I tell which it thinks=
=20
is the best hypothesis, and do you know why it won't go over two words?

My ideas for future expansion:
- use the corpus readings (on youtube) as a base for phrase recordings
- explore options for automatically translating the grammar from PEG to JSG=
F

Lastly, if I may critique your pronunciation (seems to be important if=20
we're building a phoneme database...), it sounds to me like you pronounce=
=20
word-final {e} as {ei} or {ai}, whereas it should be more like "eh" (see=20
here http://mw.lojban.org/extensions/ilmentufa/cirkotci.html).

mu'o mi'e la durkavore

El viernes, 26 de junio de 2015, 23:29:20 (UTC-4), sorpa'as plat escribi=C3=
=B3:
>
> Hi all,
>
> I'm trying to build a Lojban speech recognition called tersku. Instead of=
=20
> building an acoustic model by hand (which may need many manpower and take=
s=20
> a long time), the attempt is to take the English acoustic model (which is=
=20
> pretty mature) and adapt it for Lojban sounds.
>
> A running prototype can be found at https://git.null.tl/tersku.git (use *=
git://git.null.tl/tersku.git=20
> <http://git.null.tl/tersku.git>* to clone). The prototype uses a=20
> unmodified version of CMU's generic English acoustic model, with only=20
> necessary dictionary and grammars to parse the text "le tanxe be le birka=
=20
> cu cpana le tanxe be le botpi". To use it, recording a version of the tex=
t=20
> "le tanxe be le birka cu cpana le tanxe be le botpi", convert the recordi=
ng=20
> to wav format, and replace the /resources/org/lojban/tersku/recording.wav=
=20
> file with it. The program will output the best "hypothesis" for the text.
>
> The program does not work really well. That means there's lots of work an=
d=20
> I would appreciate your help. Below are some details of things to be done=
.
>
> *About the Program*
> tersku uses CMU's Sphinx speech recognition engine. You can find Sphinx's=
=20
> tutorials and documentations at http://cmusphinx.sourceforge.net.
>
> *Adapt the Acoustic Model*
> The adaptation requires some 16KHZ single-channel wav recordings. Help ar=
e=20
> appreciated if someone can create a Lojban phrase recording collection.=
=20
> Note that a phrase recording collection will benefit the whole Lojban=20
> community but not just the speech recognition program :)
>
> *Finish the Dictionary*
> The dictionary in the prototype locates at=20
> resources/org/lojban/tersku/jbo-1.dict. Because we are trying to adapt th=
e=20
> English acoustic model, all the phones are represented in Arpabet (*https=
://en.wikipedia.org/wiki/Arpabet=20
> <https://en.wikipedia.org/wiki/Arpabet>*). We will need to a) confirm=20
> which arpabet symbol represents which Lojban sound, and b) write a progra=
m=20
> that generates all the words in "[lojban word] [arpabet symbols]". This i=
s=20
> probably dependent of the adaptation of the acoustic model.
>
> *Finish the Grammar*
> The grammar needs to be written in JSGF format (
> http://cmusphinx.sourceforge.net/wiki/tutoriallm). This haven't been=20
> started yet (which needs help!).
>
> *Correct Me!*
> There must be mistakes and errors both in the codes and in the recognitio=
n=20
> details (I'm new to speech recognition!).
>
> Feel free to reach me at this email address or by opening an task at=20
> https://phabricator.null.tl. I'm really looking forward to a Lojban=20
> speech recognition tool, because it should be one of the features of Lojb=
an=20
> :)
>
> Wei
> mu'o mi'e la sorpa'as
>

--=20
You received this message because you are subscribed to the Google Groups "=
lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

------=_Part_90_1762697278.1435601513401
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Awesome initiative!<br><br>I had some trouble getting the =
program to run. Here are my full instructions, let me know if there is an e=
asier way:<br><br>$ git clone git://git.null.tl/tersku.git<br>$ cd tersku<b=
r>$ mvn package<br>$ mvn install<br>$ mv /path/to/my/recording.wav resource=
s/org/lojban/tersku/recording.wav<br>$ java -cp ~/.m2/repository/edu/cmu/sp=
hinx/sphinx4-core/1.0-SNAPSHOT/sphinx4-core-1.0-SNAPSHOT.jar:target/tersku-=
1.0-SNAPSHOT.jar:resources com.lojban.tersku.Main<br><br>Also, when I run t=
he program on your WAV file, I see it outputting a bunch of hypotheses as o=
ne- or two-word phrases. How can I tell which it thinks is the best hypothe=
sis, and do you know why it won't go over two words?<br><br>My ideas for fu=
ture expansion:<br>- use the corpus readings (on youtube) as a base for phr=
ase recordings<br>- explore options for automatically translating the gramm=
ar from PEG to JSGF<br><br>Lastly, if I may critique your pronunciation (se=
ems to be important if we're building a phoneme database...), it sounds to =
me like you pronounce word-final {e} as {ei} or {ai}, whereas it should be =
more like "eh" (see here http://mw.lojban.org/extensions/ilmentufa/cirkotci=
.html).<br><br>mu'o mi'e la durkavore<br><br>El viernes, 26 de junio de 201=
5, 23:29:20 (UTC-4), sorpa'as plat escribi=C3=B3:<blockquote class=3D"gmail=
_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;p=
adding-left: 1ex;"><div dir=3D"ltr">Hi all,<br><br>I'm trying to build a Lo=
jban speech recognition called tersku. Instead of building an acoustic mode=
l by hand (which may need many manpower and takes a long time), the attempt=
 is to take the English acoustic model (which is pretty mature) and adapt i=
t for Lojban sounds.<br><br>A running prototype can be found at <a href=3D"=
https://git.null.tl/tersku.git" target=3D"_blank" rel=3D"nofollow" onmoused=
own=3D"this.href=3D'https://www.google.com/url?q\75https%3A%2F%2Fgit.null.t=
l%2Ftersku.git\46sa\75D\46sntz\0751\46usg\75AFQjCNEGJjyn5v8I__KuqeeYTtQi8df=
6xw';return true;" onclick=3D"this.href=3D'https://www.google.com/url?q\75h=
ttps%3A%2F%2Fgit.null.tl%2Ftersku.git\46sa\75D\46sntz\0751\46usg\75AFQjCNEG=
Jjyn5v8I__KuqeeYTtQi8df6xw';return true;">https://git.null.tl/tersku.git</a=
> (use <b>git://<a href=3D"http://git.null.tl/tersku.git" target=3D"_blank"=
 rel=3D"nofollow" onmousedown=3D"this.href=3D'http://www.google.com/url?q\7=
5http%3A%2F%2Fgit.null.tl%2Ftersku.git\46sa\75D\46sntz\0751\46usg\75AFQjCNH=
yF-a8N0enbaTcsPwxYqGZesIHTg';return true;" onclick=3D"this.href=3D'http://w=
ww.google.com/url?q\75http%3A%2F%2Fgit.null.tl%2Ftersku.git\46sa\75D\46sntz=
\0751\46usg\75AFQjCNHyF-a8N0enbaTcsPwxYqGZesIHTg';return true;">git.null.tl=
/tersku.git</a></b> to clone). The prototype uses a unmodified version of C=
MU's generic English acoustic model, with only necessary dictionary and gra=
mmars to parse the text "le tanxe be le birka cu cpana le tanxe be le botpi=
". To use it, recording a version of the text "le tanxe be le birka cu cpan=
a le tanxe be le botpi", convert the recording to wav format, and replace t=
he /resources/org/lojban/tersku/<wbr>recording.wav file with it. The progra=
m will output the best "hypothesis" for the text.<br><br>The program does n=
ot work really well. That means there's lots of work and I would appreciate=
 your help. Below are some details of things to be done.<br><br><b>About th=
e Program</b><br>tersku uses CMU's Sphinx speech recognition engine. You ca=
n find Sphinx's tutorials and documentations at <a href=3D"http://cmusphinx=
.sourceforge.net" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.hr=
ef=3D'http://www.google.com/url?q\75http%3A%2F%2Fcmusphinx.sourceforge.net\=
46sa\75D\46sntz\0751\46usg\75AFQjCNFEwoJJxJ3bhDdqJbYUtuaWLfaYsA';return tru=
e;" onclick=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fcmus=
phinx.sourceforge.net\46sa\75D\46sntz\0751\46usg\75AFQjCNFEwoJJxJ3bhDdqJbYU=
tuaWLfaYsA';return true;">http://cmusphinx.sourceforge.<wbr>net</a>.<br><br=
><b>Adapt the Acoustic Model</b><br>The adaptation requires some 16KHZ sing=
le-channel wav recordings. Help are appreciated if someone can create a Loj=
ban phrase recording collection. Note that a phrase recording collection wi=
ll benefit the whole Lojban community but not just the speech recognition p=
rogram :)<br><br><b>Finish the Dictionary</b><br>The dictionary in the prot=
otype locates at resources/org/lojban/tersku/<wbr>jbo-1.dict. Because we ar=
e trying to adapt the English acoustic model, all the phones are represente=
d in Arpabet (<b><a href=3D"https://en.wikipedia.org/wiki/Arpabet" target=
=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D'https://www.google=
.com/url?q\75https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FArpabet\46sa\75D\46snt=
z\0751\46usg\75AFQjCNH2cxyWS0F4PEuwot3roS5YjxM6bw';return true;" onclick=3D=
"this.href=3D'https://www.google.com/url?q\75https%3A%2F%2Fen.wikipedia.org=
%2Fwiki%2FArpabet\46sa\75D\46sntz\0751\46usg\75AFQjCNH2cxyWS0F4PEuwot3roS5Y=
jxM6bw';return true;">https://en.wikipedia.org/<wbr>wiki/Arpabet</a></b>). =
We will need to a) confirm which arpabet symbol represents which Lojban sou=
nd, and b) write a program that generates all the words in "[lojban word] [=
arpabet symbols]". This is probably dependent of the adaptation of the acou=
stic model.<br><br><b>Finish the Grammar</b><br>The grammar needs to be wri=
tten in JSGF format (<a href=3D"http://cmusphinx.sourceforge.net/wiki/tutor=
iallm" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D'http:=
//www.google.com/url?q\75http%3A%2F%2Fcmusphinx.sourceforge.net%2Fwiki%2Ftu=
toriallm\46sa\75D\46sntz\0751\46usg\75AFQjCNGTs1v5V2Ed7XlAKW4zrlIHzu2A6w';r=
eturn true;" onclick=3D"this.href=3D'http://www.google.com/url?q\75http%3A%=
2F%2Fcmusphinx.sourceforge.net%2Fwiki%2Ftutoriallm\46sa\75D\46sntz\0751\46u=
sg\75AFQjCNGTs1v5V2Ed7XlAKW4zrlIHzu2A6w';return true;">http://cmusphinx.sou=
rceforge.<wbr>net/wiki/tutoriallm</a>). This haven't been started yet (whic=
h needs help!).<br><br><b>Correct Me!</b><br>There must be mistakes and err=
ors both in the codes and in the recognition details (I'm new to speech rec=
ognition!).<br><br>Feel free to reach me at this email address or by openin=
g an task at <a href=3D"https://phabricator.null.tl" target=3D"_blank" rel=
=3D"nofollow" onmousedown=3D"this.href=3D'https://www.google.com/url?q\75ht=
tps%3A%2F%2Fphabricator.null.tl\46sa\75D\46sntz\0751\46usg\75AFQjCNHV6tM5Uu=
HTadKRmgYNgYpRwEC5DA';return true;" onclick=3D"this.href=3D'https://www.goo=
gle.com/url?q\75https%3A%2F%2Fphabricator.null.tl\46sa\75D\46sntz\0751\46us=
g\75AFQjCNHV6tM5UuHTadKRmgYNgYpRwEC5DA';return true;">https://phabricator.n=
ull.tl</a>. I'm really looking forward to a Lojban speech recognition tool,=
 because it should be one of the features of Lojban :)<br><br>Wei<br>mu'o m=
i'e la sorpa'as<br></div></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:lojban+unsubscribe@googlegroups.com">lojban+unsub=
scribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/group/lojban">http:=
//groups.google.com/group/lojban</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

------=_Part_90_1762697278.1435601513401--
------=_Part_89_1792634663.1435601513401--