Received: from mail-ie0-f189.google.com ([209.85.223.189]:34809) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.85) (envelope-from ) id 1Z9dXE-0007wA-0u for lojban-list-archive@lojban.org; Mon, 29 Jun 2015 11:12:04 -0700 Received: by ierx19 with SMTP id x19sf45609102ier.1 for ; Mon, 29 Jun 2015 11:11:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=4mQYEmQWcuCeCwJ5ULHj3hr2Puzn9dp/SzNhxfbNMUE=; b=D6BCByXkcaXommlOCD16eugqFzyeh75YPgUDK2VOpo3dCIgUUgzNWbNDEmdnGDFDcK EjFnzTqZfWlyIIz5WCpnf/WWbw48iiNsGMSIt9/JujqhGmtJC9tPKr0u8ImsFm/uzPPJ oiR6Gs+bebnEz7GwfYxhwuGVSwomq8n21vjmU+EZ36LrmJ39rD6rfUZyRh3dTFmHmdUf tFgnIwDNkkb/7zUgYiHebXZwXebVgIK/fB2JkzKDK/eKfS9fxLBSxTs/DkJx4e6q5/jd 9t0szK5Ln9xX3v1ipSqdodfovkH1/pZtbW1EmluqL+hpCnkcaxJ66otM87zH1x8dJK94 lnag== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=4mQYEmQWcuCeCwJ5ULHj3hr2Puzn9dp/SzNhxfbNMUE=; b=V4fjlScyXGpqsX+QEvWyl5piYrBlfs1gFEDEje40UFeyqZsL3unxq4eSjO9dTKmyQo 5eN7XwtGtt1/knm5BDl8SFpLfDBXvYM4BKlbEyFpWy6cVD+hOJS2Zwu+FlC826DF2X1i GUXb6DXAW4iLQCeb68pC/3RfnhF5kjqp6Z2NeEXdn6NR9KMQqqNS+7aIkmt2soOo6ZCq 9vzyAB2DmfQ0zixD+yeaVfaOEZg14LeULMD+eD4oJyVgjbAZtBXs4HBczgZOybWQy/gB 2AtGvjoQ4F0sH2A7+NWfPSfD6SEsguTrx2k5Zolk2lm9ldh0rwVr2KurikxAuQbpHyYC lNAA== X-Received: by 10.140.92.178 with SMTP id b47mr122308qge.13.1435601514054; Mon, 29 Jun 2015 11:11:54 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.140.85.177 with SMTP id n46ls3324819qgd.35.gmail; Mon, 29 Jun 2015 11:11:53 -0700 (PDT) X-Received: by 10.140.19.76 with SMTP id 70mr187698qgg.21.1435601513798; Mon, 29 Jun 2015 11:11:53 -0700 (PDT) Date: Mon, 29 Jun 2015 11:11:53 -0700 (PDT) From: la durka To: lojban@googlegroups.com Message-Id: In-Reply-To: References: Subject: [lojban] Re: The Prototype of a Lojban Speech Recognition Tool MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_89_1792634663.1435601513401" X-Original-Sender: durka42@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -3.1 (---) X-Spam_score: -3.1 X-Spam_score_int: -30 X-Spam_bar: --- ------=_Part_89_1792634663.1435601513401 Content-Type: multipart/alternative; boundary="----=_Part_90_1762697278.1435601513401" ------=_Part_90_1762697278.1435601513401 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Awesome initiative! I had some trouble getting the program to run. Here are my full=20 instructions, let me know if there is an easier way: $ git clone git://git.null.tl/tersku.git $ cd tersku $ mvn package $ mvn install $ mv /path/to/my/recording.wav resources/org/lojban/tersku/recording.wav $ java -cp=20 ~/.m2/repository/edu/cmu/sphinx/sphinx4-core/1.0-SNAPSHOT/sphinx4-core-1.0-= SNAPSHOT.jar:target/tersku-1.0-SNAPSHOT.jar:resources=20 com.lojban.tersku.Main Also, when I run the program on your WAV file, I see it outputting a bunch= =20 of hypotheses as one- or two-word phrases. How can I tell which it thinks= =20 is the best hypothesis, and do you know why it won't go over two words? My ideas for future expansion: - use the corpus readings (on youtube) as a base for phrase recordings - explore options for automatically translating the grammar from PEG to JSG= F Lastly, if I may critique your pronunciation (seems to be important if=20 we're building a phoneme database...), it sounds to me like you pronounce= =20 word-final {e} as {ei} or {ai}, whereas it should be more like "eh" (see=20 here http://mw.lojban.org/extensions/ilmentufa/cirkotci.html). mu'o mi'e la durkavore El viernes, 26 de junio de 2015, 23:29:20 (UTC-4), sorpa'as plat escribi=C3= =B3: > > Hi all, > > I'm trying to build a Lojban speech recognition called tersku. Instead of= =20 > building an acoustic model by hand (which may need many manpower and take= s=20 > a long time), the attempt is to take the English acoustic model (which is= =20 > pretty mature) and adapt it for Lojban sounds. > > A running prototype can be found at https://git.null.tl/tersku.git (use *= git://git.null.tl/tersku.git=20 > * to clone). The prototype uses a=20 > unmodified version of CMU's generic English acoustic model, with only=20 > necessary dictionary and grammars to parse the text "le tanxe be le birka= =20 > cu cpana le tanxe be le botpi". To use it, recording a version of the tex= t=20 > "le tanxe be le birka cu cpana le tanxe be le botpi", convert the recordi= ng=20 > to wav format, and replace the /resources/org/lojban/tersku/recording.wav= =20 > file with it. The program will output the best "hypothesis" for the text. > > The program does not work really well. That means there's lots of work an= d=20 > I would appreciate your help. Below are some details of things to be done= . > > *About the Program* > tersku uses CMU's Sphinx speech recognition engine. You can find Sphinx's= =20 > tutorials and documentations at http://cmusphinx.sourceforge.net. > > *Adapt the Acoustic Model* > The adaptation requires some 16KHZ single-channel wav recordings. Help ar= e=20 > appreciated if someone can create a Lojban phrase recording collection.= =20 > Note that a phrase recording collection will benefit the whole Lojban=20 > community but not just the speech recognition program :) > > *Finish the Dictionary* > The dictionary in the prototype locates at=20 > resources/org/lojban/tersku/jbo-1.dict. Because we are trying to adapt th= e=20 > English acoustic model, all the phones are represented in Arpabet (*https= ://en.wikipedia.org/wiki/Arpabet=20 > *). We will need to a) confirm=20 > which arpabet symbol represents which Lojban sound, and b) write a progra= m=20 > that generates all the words in "[lojban word] [arpabet symbols]". This i= s=20 > probably dependent of the adaptation of the acoustic model. > > *Finish the Grammar* > The grammar needs to be written in JSGF format ( > http://cmusphinx.sourceforge.net/wiki/tutoriallm). This haven't been=20 > started yet (which needs help!). > > *Correct Me!* > There must be mistakes and errors both in the codes and in the recognitio= n=20 > details (I'm new to speech recognition!). > > Feel free to reach me at this email address or by opening an task at=20 > https://phabricator.null.tl. I'm really looking forward to a Lojban=20 > speech recognition tool, because it should be one of the features of Lojb= an=20 > :) > > Wei > mu'o mi'e la sorpa'as > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. ------=_Part_90_1762697278.1435601513401 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Awesome initiative!

I had some trouble getting the = program to run. Here are my full instructions, let me know if there is an e= asier way:

$ git clone git://git.null.tl/tersku.git
$ cd tersku$ mvn package
$ mvn install
$ mv /path/to/my/recording.wav resource= s/org/lojban/tersku/recording.wav
$ java -cp ~/.m2/repository/edu/cmu/sp= hinx/sphinx4-core/1.0-SNAPSHOT/sphinx4-core-1.0-SNAPSHOT.jar:target/tersku-= 1.0-SNAPSHOT.jar:resources com.lojban.tersku.Main

Also, when I run t= he program on your WAV file, I see it outputting a bunch of hypotheses as o= ne- or two-word phrases. How can I tell which it thinks is the best hypothe= sis, and do you know why it won't go over two words?

My ideas for fu= ture expansion:
- use the corpus readings (on youtube) as a base for phr= ase recordings
- explore options for automatically translating the gramm= ar from PEG to JSGF

Lastly, if I may critique your pronunciation (se= ems to be important if we're building a phoneme database...), it sounds to = me like you pronounce word-final {e} as {ei} or {ai}, whereas it should be = more like "eh" (see here http://mw.lojban.org/extensions/ilmentufa/cirkotci= .html).

mu'o mi'e la durkavore

El viernes, 26 de junio de 201= 5, 23:29:20 (UTC-4), sorpa'as plat escribi=C3=B3:
Hi all,

I'm trying to build a Lo= jban speech recognition called tersku. Instead of building an acoustic mode= l by hand (which may need many manpower and takes a long time), the attempt= is to take the English acoustic model (which is pretty mature) and adapt i= t for Lojban sounds.

A running prototype can be found at https://git.null.tl/tersku.git (use git://git.null.tl= /tersku.git to clone). The prototype uses a unmodified version of C= MU's generic English acoustic model, with only necessary dictionary and gra= mmars to parse the text "le tanxe be le birka cu cpana le tanxe be le botpi= ". To use it, recording a version of the text "le tanxe be le birka cu cpan= a le tanxe be le botpi", convert the recording to wav format, and replace t= he /resources/org/lojban/tersku/recording.wav file with it. The progra= m will output the best "hypothesis" for the text.

The program does n= ot work really well. That means there's lots of work and I would appreciate= your help. Below are some details of things to be done.

About th= e Program
tersku uses CMU's Sphinx speech recognition engine. You ca= n find Sphinx's tutorials and documentations at http://cmusphinx.sourceforge.net.
Adapt the Acoustic Model
The adaptation requires some 16KHZ sing= le-channel wav recordings. Help are appreciated if someone can create a Loj= ban phrase recording collection. Note that a phrase recording collection wi= ll benefit the whole Lojban community but not just the speech recognition p= rogram :)

Finish the Dictionary
The dictionary in the prot= otype locates at resources/org/lojban/tersku/jbo-1.dict. Because we ar= e trying to adapt the English acoustic model, all the phones are represente= d in Arpabet (https://en.wikipedia.org/wiki/Arpabet). = We will need to a) confirm which arpabet symbol represents which Lojban sou= nd, and b) write a program that generates all the words in "[lojban word] [= arpabet symbols]". This is probably dependent of the adaptation of the acou= stic model.

Finish the Grammar
The grammar needs to be wri= tten in JSGF format (http://cmusphinx.sou= rceforge.net/wiki/tutoriallm). This haven't been started yet (whic= h needs help!).

Correct Me!
There must be mistakes and err= ors both in the codes and in the recognition details (I'm new to speech rec= ognition!).

Feel free to reach me at this email address or by openin= g an task at https://phabricator.n= ull.tl. I'm really looking forward to a Lojban speech recognition tool,= because it should be one of the features of Lojban :)

Wei
mu'o m= i'e la sorpa'as

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http:= //groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_90_1762697278.1435601513401-- ------=_Part_89_1792634663.1435601513401--