From nobody@digitalkingdom.org Sun Jul 27 10:02:12 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Sun, 27 Jul 2008 10:02:12 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1KN9dL-0003Je-Tl for lojban-list-real@lojban.org; Sun, 27 Jul 2008 10:02:12 -0700 Received: from lax-green-bigip-5.dreamhost.com ([208.113.200.5] helo=spaceymail-a3.g.dreamhost.com) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1KN9dI-0003Da-DI for lojban-list@lojban.org; Sun, 27 Jul 2008 10:02:11 -0700 Received: from pal.finagle.org (dsl254-021-156.sea1.dsl.speakeasy.net [216.254.21.156]) by spaceymail-a3.g.dreamhost.com (Postfix) with ESMTP id 3D9D21954DE for ; Sun, 27 Jul 2008 10:02:06 -0700 (PDT) Message-ID: <488CAA0D.8000102@finagle.org> Date: Sun, 27 Jul 2008 10:02:05 -0700 From: Steve Sloan User-Agent: Thunderbird 2.0.0.5 (X11/20070719) MIME-Version: 1.0 To: lojban-list@lojban.org Subject: [lojban] Re: Lojban Speech Recognition semester-project References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by Ecartis X-Spam-Score: -0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14617 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: steve@finagle.org Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list Nico Möller wrote: > Unfortunately we discovered that there is very few (usable) lojban audio > data on the web, but we actually need huge amounts of them to feed our > training algorithms. It would be really cool if some of you could > actually send us some audio data we can work with, Instead of collecting random bits of audio, it occurs to me that the community could devise a short sample corpus of Lojban text that could then be recorded as spoken by a wide variety of different accents, speech rhythms, mis-pronunciations, etc. A good place to start would be a Lojban pangram[0], but an ideal training set would include most/all legal two-letter combinations. Would it be crazy to consider the shortest meaningful text that included all cmavo and lujvo? Probably ... [0] a short text containing every letter in the alphabet, e.g. http://en.wikipedia.org/wiki/The_quick_brown_fox -- Steve To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.