[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: Lojban Speech Recognition semester-project
Nico Möller wrote:
Unfortunately we discovered that there is very few (usable) lojban audio
data on the web, but we actually need huge amounts of them to feed our
training algorithms. It would be really cool if some of you could
actually send us some audio data we can work with,
Instead of collecting random bits of audio, it occurs to me that the
community could devise a short sample corpus of Lojban text that could
then be recorded as spoken by a wide variety of different accents,
speech rhythms, mis-pronunciations, etc.
A good place to start would be a Lojban pangram[0], but an ideal
training set would include most/all legal two-letter combinations.
Would it be crazy to consider the shortest meaningful text that included
all cmavo and lujvo? Probably ...
[0] a short text containing every letter in the alphabet, e.g.
http://en.wikipedia.org/wiki/The_quick_brown_fox
-- Steve
To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.