From nobody@digitalkingdom.org Thu Jul 17 07:09:59 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 17 Jul 2008 07:10:00 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1KJUBD-0003oz-SY for lojban-list-real@lojban.org; Thu, 17 Jul 2008 07:09:59 -0700 Received: from sabre-wulf.nvg.ntnu.no ([129.241.210.67]) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1KJUB6-0003oY-KD for lojban-list@lojban.org; Thu, 17 Jul 2008 07:09:59 -0700 Received: from hagbart.nvg.ntnu.no (hagbart.nvg.ntnu.no [129.241.210.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sabre-wulf.nvg.ntnu.no (Postfix) with ESMTP id AEE4694789 for ; Thu, 17 Jul 2008 16:09:37 +0200 (CEST) Received: from hagbart.nvg.ntnu.no (localhost.localdomain [127.0.0.1]) by hagbart.nvg.ntnu.no (8.13.8/8.12.8) with ESMTP id m6HE9b8S023267 for ; Thu, 17 Jul 2008 16:09:37 +0200 Received: (from arj@localhost) by hagbart.nvg.ntnu.no (8.13.8/8.13.1/Submit) id m6HE9au1023266 for lojban-list@lojban.org; Thu, 17 Jul 2008 16:09:36 +0200 Date: Thu, 17 Jul 2008 16:09:36 +0200 From: Arnt Richard Johansen To: lojban-list@lojban.org Subject: [lojban] Re: Lojban Speech Recognition semester-project Message-ID: <20080717140936.GG2355@nvg.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-NVG-MailScanner-Information: Please contact the ISP for more information X-NVG-MailScanner: Found to be clean X-MailScanner-From: arj@nvg.ntnu.no Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by Ecartis X-Spam-Score: 0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14611 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: arj@nvg.org Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list On Wed, Jul 16, 2008 at 11:29:47PM +0200, Nico Möller wrote: > It would > be really cool if some of you could actually send us some audio data we can > work with, if you do so please provide them in the following format: > > - 16bit mono, 16khz > - preferable raw or wav data files > - one sentence per audio file > - a transcript text file containing one sentence per line + the name of the > audio file in which the sentence was uttered We have a few hours of recordings of spontaneous speech, together with transcriptions, here: http://www.lojban.org/tiki/tiki-index.php?page=Story+Time+With+Uncle+Robin&bl You "only" need some volunteers to sentence-align it. :-) Out of curiosity: Lojban claims to be "self-segregating", which means that if you know the phoneme string, and you know the stress pattern, you also know how to separate it into words. Will you be taking advantage of this in your model? -- Arnt Richard Johansen http://arj.nvg.org/ "I had to translate this sentence into English because I could not read the original Sanskrit." --Douglas Hofstadter To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.