[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Lojban Speech Recognition semester-project



A Lojban pangram: .o'i mu xagji sofybakni cu zvati le purdi (Watch out, five hungry Soviet cows are in the garden)

On Mon, Jul 28, 2008 at 1:02 AM, Steve Sloan <steve@finagle.org> wrote:
Nico Möller wrote:
Unfortunately we discovered that there is very few (usable) lojban audio data on the web, but we actually need huge amounts of them to feed our training algorithms. It would be really cool if some of you could actually send us some audio data we can work with,

Instead of collecting random bits of audio, it occurs to me that the community could devise a short sample corpus of Lojban text that could then be recorded as spoken by a wide variety of different accents, speech rhythms, mis-pronunciations, etc.

A good place to start would be a Lojban pangram[0], but an ideal training set would include most/all legal two-letter combinations. Would it be crazy to consider the shortest meaningful text that included all cmavo and lujvo?  Probably ...


[0] a short text containing every letter in the alphabet, e.g. http://en.wikipedia.org/wiki/The_quick_brown_fox

-- Steve




To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.