[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] Re: IRC logs and text archives - volunteers wanted
At 10:43 PM 11/13/02 -0600, Jordan wrote:
On Wed, Nov 13, 2002 at 11:23:11PM -0500, Bob LeChevalier-Logical Language =
Group wrote:
> Robin P says that there has been a lot of activity on IRC for a while, bu=
t=20
> that in general he is not logging it and does not know of anyone else who=
is.
>=20
> Does anyone have a collection of Lojban IRC logs? We are going to be=20
> looking for Lojban text corpera in the next several weeks for dictionary=
=20
> work, and if a lot of Lojban conversation is taking place on IRC, that=20
> conversation should be included in the corpera.
I have essentially noninterrupted logs (10 megs of em) since Sun
May 12 08:40:20 2002, when I first joined.
That's a lot! I wonder if Robin has room for that much (and more if it
keeps accumulating at that rate).
What percentage of it would you say is IN Lojban, as opposed to being
discussion in English (or other languages) ABOUT Lojban
However, I wonder what the interest in such text could be?
When we say "let usage decide", "usage" is NOT limited to major translation
efforts. If we look at the text archives of stuff on the list, and
translations, it is heavily dominated by a couple of Lojbanists (Nick and
Goran in the early days, Jorge and xod more recently). Robin P. has
pointed out that there are people active on IRC that are not active on the
list and in other forums, and this suggests that we would have a much
broader spectrum of usage, from more members of the community, than we can
get from the existing text archives.
It's all 'conversation quality',
Conversation is a rather important form of language usage, is it not? The
question is not whether its quality is "conversational" but whether it
represents "skilled usage", and that obviously has to be evaluated by
looking at the whole text of the person who wrote it, as well as the
audience of who he was writing to, rather than a single snippet of
conversation out of context.
"Conversation quality" actually represents a very desirable thing in a
corpus of usage. If the speakers are skilled users, it represents more
closely the way fluent use of the language works, whereas translations and
other non-real-time writings are NOT usually "fluent" but rather
"considered efforts". When we are looking at how the language usage
reflects "logic" we may want to focus on considered usage; when we want to
look at how people tackle problems of idiomatic expression, we can compare
conversational usage to the comparable idioms of the native language of the
speaker.
and anyone who wants some of that sort
of Lojban text can just go on irc (at the right times), and there'll
likely be a few people around to talk to bau la lojban.
The point is not to merely be able to find sample Lojban texts, but to be
able to assemble as large a corpus of Lojban usage as possible, so we can
go delving to find out if certain obscure (in meaning) cmavo have been
used, and in what manner they have been used by multiple people. We want
to be able to determine NOT what jboske says the word "should" mean, but
what usage has said it "does" mean to people.
An upcoming major push on the Lojban dictionary requires that we be able to
find out if words have been used, and whether they've been used in the way
Lojbab intended as opposed to other plausible ways to interpret the words
that appear in the gismu and cmavo list which some people have understood
differently than Lojbab intended %^)
Nick has cited as a proper use of corpera, the actual usage of "vo'a"
Once we move in dictionary writing from prescriptive language definition to
descriptive reflection of actual usage, this will become even more
important. Defining lujvo is more of a descriptive effort, since the place
structure rules in CLL are just guidelines.
Or is this for word frequency-type infos?
That too is a valid use, though not the one I had in mind. If your 10 megs
is substantially Lojban, it is decidedly better data than Lojban List,
which has a very low percentage of actual Lojban text, and much of it is
snippets and word-proposals and repeated quotations that can seriously skew
any word frequency analysis.
Another possible use is for conversation examples for further efforts at a
Lojban textbook. Authentic conversation is far more interesting to learn
from, than are canned "dialogs" that don't actually represent what any
*normal* person would say in a conversation. %^)
Anyway I'm happy to provide them if someone wants them.
We definitely will want them - heck, *I* want them for the LLG archive, but
I think an on-line archive is at least as important as my having files here
on my computer constituting the "official archive".
We need to find someone willing to index them (and perhaps to weed out any
logs that do not have any substantial Lojban text - discussions about the
language are interesting but are not a corpus of language usage), and to
put them on a site where they can be looked at (lojban.org or
elsewhere). And if they get put on a web site, I'd like the group I've
asked for to maintain a list of web sites with Lojban text to include it.
lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org