[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: Lojban Speech Recognition semester-project
On Wed, Jul 16, 2008 at 11:29:47PM +0200, Nico Möller wrote:
> It would
> be really cool if some of you could actually send us some audio data we can
> work with, if you do so please provide them in the following format:
>
> - 16bit mono, 16khz
> - preferable raw or wav data files
> - one sentence per audio file
> - a transcript text file containing one sentence per line + the name of the
> audio file in which the sentence was uttered
We have a few hours of recordings of spontaneous speech, together with transcriptions, here:
http://www.lojban.org/tiki/tiki-index.php?page=Story+Time+With+Uncle+Robin&bl
You "only" need some volunteers to sentence-align it. :-)
Out of curiosity:
Lojban claims to be "self-segregating", which means that if you know the phoneme string, and you know the stress pattern, you also know how to separate it into words. Will you be taking advantage of this in your model?
--
Arnt Richard Johansen http://arj.nvg.org/
"I had to translate this sentence into English because I could not read the
original Sanskrit." --Douglas Hofstadter
To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.