[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Lojban Speech Recognition semester-project



Coi

I'm not sure if you are aware of http://jbobac.lojban.org/ which has some examples with transcript. A series of words (with transscript) can be found at http://allalone.org/cizra/ which is intended as a pronunciation guide.

mu'o mi'e .laxris

Nico Möller wrote:
Random sentences are quite ok, we ourself recorded some sentences from
Alice, but send us whatever you have got, as long we got a transcript of
what was uttered it would be totally sufficient.

I know that uncompressed audio files are quite big, but hey its only 16bit
mono and of course you can compress them using zip, 7z or whatever you like
;). I think then it shold be no Problem to send them via mail. Or you can
use some free filehosting on the web and send us the links. Just be
creative... If none of theses methods should be appropriable just send them
in a format (mp3, etc.) we can convert back into wavs...

Thanks a lot for your help,
Nico

On Thu, Jul 17, 2008 at 12:36 PM, james riley <jimr1603@gmail.com> wrote:

Random sentences okay or should they be part of a bigger prose? I could
churn out loads tomorrow (unless something happens), but I'm afk today to
help out at my uni. My pronunciation needs practise, but is mostly okay.
Also, wav is very big, how do you want us to send you loads of recordings in
wav?

2008/7/16 Nico Möller <nmoeller@uos.de>:

Hi guys,
We have got a request a hopefully some of you are willing to help us. We
are currently studying cognitive science at the university of osnabrueck and
participating in a course called "practical natural language processing",
which is some kind of semester project in lingusitics.  Our group decided to
deal with some speech recognition and because lojban has so nice phonetic
features we choose it as our target language,  Unfortunately we discovered
that there is very few (usable) lojban audio data on the web, but we
actually need huge amounts of them to feed our training algorithms. It would
be really cool if some of you could actually send us some audio data we can
work with, if you do so please provide them in the following format:

- 16bit mono, 16khz
- preferable raw or wav data files
- one sentence per audio file
- a transcript text file containing one sentence per line + the name of
the audio file in which the sentence was uttered

Everybody who sends as applicable data will be mentioned by name in our
final term paper, which will be published at the end of this month (You see
will really need those data quick).

Thanks a lot for your effort,
Nico & Thorben

--

e'osai ko sarji la lojban.
http://lojban.org         Please! Support Lojban.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.