From lojbab@lojban.org Thu Nov 14 02:06:58 2002 Return-Path: X-Sender: lojbab@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_2_3_0); 14 Nov 2002 10:06:58 -0000 Received: (qmail 5153 invoked from network); 14 Nov 2002 10:06:57 -0000 Received: from unknown (66.218.66.217) by m12.grp.scd.yahoo.com with QMQP; 14 Nov 2002 10:06:57 -0000 Received: from unknown (HELO lakemtao03.cox.net) (68.1.17.242) by mta2.grp.scd.yahoo.com with SMTP; 14 Nov 2002 10:06:57 -0000 Received: from lojban.lojban.org ([68.100.206.153]) by lakemtao03.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20021114100656.RXD16428.lakemtao03.cox.net@lojban.lojban.org> for ; Thu, 14 Nov 2002 05:06:56 -0500 Message-Id: <5.1.0.14.0.20021114043147.033943d0@pop.east.cox.net> X-Sender: rlechevalier@pop.east.cox.net X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 14 Nov 2002 05:05:58 -0500 To: lojban@yahoogroups.com Subject: Re: [lojban] Re: IRC logs and text archives - volunteers wanted In-Reply-To: <20021114044359.GA71467@allusion.net> References: <5.1.0.14.0.20021113231400.0337d580@pop.east.cox.net> <5.1.0.14.0.20021113231400.0337d580@pop.east.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed From: Robert LeChevalier X-Yahoo-Group-Post: member; u=1120595 X-Yahoo-Profile: lojbab At 10:43 PM 11/13/02 -0600, Jordan wrote: >On Wed, Nov 13, 2002 at 11:23:11PM -0500, Bob LeChevalier-Logical Language = >Group wrote: > > Robin P says that there has been a lot of activity on IRC for a while, bu= >t=20 > > that in general he is not logging it and does not know of anyone else who= > is. > >=20 > > Does anyone have a collection of Lojban IRC logs? We are going to be=20 > > looking for Lojban text corpera in the next several weeks for dictionary= >=20 > > work, and if a lot of Lojban conversation is taking place on IRC, that=20 > > conversation should be included in the corpera. > >I have essentially noninterrupted logs (10 megs of em) since Sun >May 12 08:40:20 2002, when I first joined. That's a lot! I wonder if Robin has room for that much (and more if it keeps accumulating at that rate). What percentage of it would you say is IN Lojban, as opposed to being discussion in English (or other languages) ABOUT Lojban >However, I wonder what the interest in such text could be? When we say "let usage decide", "usage" is NOT limited to major translation efforts. If we look at the text archives of stuff on the list, and translations, it is heavily dominated by a couple of Lojbanists (Nick and Goran in the early days, Jorge and xod more recently). Robin P. has pointed out that there are people active on IRC that are not active on the list and in other forums, and this suggests that we would have a much broader spectrum of usage, from more members of the community, than we can get from the existing text archives. >It's all 'conversation quality', Conversation is a rather important form of language usage, is it not? The question is not whether its quality is "conversational" but whether it represents "skilled usage", and that obviously has to be evaluated by looking at the whole text of the person who wrote it, as well as the audience of who he was writing to, rather than a single snippet of conversation out of context. "Conversation quality" actually represents a very desirable thing in a corpus of usage. If the speakers are skilled users, it represents more closely the way fluent use of the language works, whereas translations and other non-real-time writings are NOT usually "fluent" but rather "considered efforts". When we are looking at how the language usage reflects "logic" we may want to focus on considered usage; when we want to look at how people tackle problems of idiomatic expression, we can compare conversational usage to the comparable idioms of the native language of the speaker. >and anyone who wants some of that sort >of Lojban text can just go on irc (at the right times), and there'll >likely be a few people around to talk to bau la lojban. The point is not to merely be able to find sample Lojban texts, but to be able to assemble as large a corpus of Lojban usage as possible, so we can go delving to find out if certain obscure (in meaning) cmavo have been used, and in what manner they have been used by multiple people. We want to be able to determine NOT what jboske says the word "should" mean, but what usage has said it "does" mean to people. An upcoming major push on the Lojban dictionary requires that we be able to find out if words have been used, and whether they've been used in the way Lojbab intended as opposed to other plausible ways to interpret the words that appear in the gismu and cmavo list which some people have understood differently than Lojbab intended %^) Nick has cited as a proper use of corpera, the actual usage of "vo'a" Once we move in dictionary writing from prescriptive language definition to descriptive reflection of actual usage, this will become even more important. Defining lujvo is more of a descriptive effort, since the place structure rules in CLL are just guidelines. >Or is this for word frequency-type infos? That too is a valid use, though not the one I had in mind. If your 10 megs is substantially Lojban, it is decidedly better data than Lojban List, which has a very low percentage of actual Lojban text, and much of it is snippets and word-proposals and repeated quotations that can seriously skew any word frequency analysis. Another possible use is for conversation examples for further efforts at a Lojban textbook. Authentic conversation is far more interesting to learn from, than are canned "dialogs" that don't actually represent what any *normal* person would say in a conversation. %^) >Anyway I'm happy to provide them if someone wants them. We definitely will want them - heck, *I* want them for the LLG archive, but I think an on-line archive is at least as important as my having files here on my computer constituting the "official archive". We need to find someone willing to index them (and perhaps to weed out any logs that do not have any substantial Lojban text - discussions about the language are interesting but are not a corpus of language usage), and to put them on a site where they can be looked at (lojban.org or elsewhere). And if they get put on a web site, I'd like the group I've asked for to maintain a list of web sites with Lojban text to include it. lojbab -- lojbab lojbab@lojban.org Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 Artificial language Loglan/Lojban: http://www.lojban.org