From nobody@digitalkingdom.org Thu Jul 17 07:09:59 2008
Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 17 Jul 2008 07:10:00 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69)	(envelope-from <nobody@digitalkingdom.org>)	id 1KJUBD-0003oz-SY	for lojban-list-real@lojban.org; Thu, 17 Jul 2008 07:09:59 -0700
Received: from sabre-wulf.nvg.ntnu.no ([129.241.210.67])	by chain.digitalkingdom.org with esmtp (Exim 4.69)	(envelope-from <arj@nvg.ntnu.no>)	id 1KJUB6-0003oY-KD	for lojban-list@lojban.org; Thu, 17 Jul 2008 07:09:59 -0700
Received: from hagbart.nvg.ntnu.no (hagbart.nvg.ntnu.no [129.241.210.68])	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))	(No client certificate requested)	by sabre-wulf.nvg.ntnu.no (Postfix) with ESMTP id AEE4694789	for <lojban-list@lojban.org>; Thu, 17 Jul 2008 16:09:37 +0200 (CEST)
Received: from hagbart.nvg.ntnu.no (localhost.localdomain [127.0.0.1])	by hagbart.nvg.ntnu.no (8.13.8/8.12.8) with ESMTP id m6HE9b8S023267	for <lojban-list@lojban.org>; Thu, 17 Jul 2008 16:09:37 +0200
Received: (from arj@localhost)	by hagbart.nvg.ntnu.no (8.13.8/8.13.1/Submit) id m6HE9au1023266	for lojban-list@lojban.org; Thu, 17 Jul 2008 16:09:36 +0200
Date: Thu, 17 Jul 2008 16:09:36 +0200
From: Arnt Richard Johansen <arj@nvg.org>
To: lojban-list@lojban.org
Subject: [lojban] Re: Lojban Speech Recognition semester-project
Message-ID: <20080717140936.GG2355@nvg.org>
References: <bffd72fa0807161429g7121fd9en6b54c90016fcaa65@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <bffd72fa0807161429g7121fd9en6b54c90016fcaa65@mail.gmail.com>
User-Agent: Mutt/1.4.2.1i
X-NVG-MailScanner-Information: Please contact the ISP for more information
X-NVG-MailScanner: Found to be clean
X-MailScanner-From: arj@nvg.ntnu.no
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by Ecartis
X-Spam-Score: 0.0
X-Spam-Score-Int: 0
X-Spam-Bar: /
X-archive-position: 14611
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: arj@nvg.org
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

On Wed, Jul 16, 2008 at 11:29:47PM +0200, Nico Möller wrote:

> It would
> be really cool if some of you could actually send us some audio data we can
> work with, if you do so please provide them in the following format:
> 
> - 16bit mono, 16khz
> - preferable raw or wav data files
> - one sentence per audio file
> - a transcript text file containing one sentence per line + the name of the
> audio file in which the sentence was uttered

We have a few hours of recordings of spontaneous speech, together with transcriptions, here:
http://www.lojban.org/tiki/tiki-index.php?page=Story+Time+With+Uncle+Robin&bl

You "only" need some volunteers to sentence-align it. :-)

Out of curiosity:

Lojban claims to be "self-segregating", which means that if you know the phoneme string, and you know the stress pattern, you also know how to separate it into words. Will you be taking advantage of this in your model?

-- 
Arnt Richard Johansen                                http://arj.nvg.org/
"I had to translate this sentence into English because I could not read the
original Sanskrit."        --Douglas Hofstadter


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.