From lojbab@lojban.org Tue Apr 24 14:43:48 2001
Return-Path: <lojbab@lojban.org>
X-Sender: lojbab@lojban.org
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-7_1_2); 24 Apr 2001 21:43:48 -0000
Received: (qmail 18887 invoked from network); 24 Apr 2001 21:43:47 -0000
Received: from unknown (10.1.10.27) by m8.onelist.org with QMQP; 24 Apr 2001 21:43:47 -0000
Received: from unknown (HELO stmpy-2.cais.net) (205.252.14.72) by mta2 with SMTP; 24 Apr 2001 21:43:47 -0000
Received: from bob.lojban.org (72.dynamic.cais.com [207.226.56.72]) by stmpy-2.cais.net (8.11.1/8.11.1) with ESMTP id f3OLhiT44393 for <lojban@yahoogroups.com>; Tue, 24 Apr 2001 17:43:44 -0400 (EDT)
Message-Id: <4.3.2.7.2.20010424161239.00ad1100@127.0.0.1>
X-Sender: vir1036/pop.cais.com@127.0.0.1
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Tue, 24 Apr 2001 17:46:54 -0400
To: lojban@yahoogroups.com
Subject: Re: [lojban] NickFest 2
In-Reply-To: <20010424115814.M28300@digitalkingdom.org>
References: <4.3.2.7.2.20010424020703.00bba800@127.0.0.1> <4.3.2.7.2.20010424020703.00bba800@127.0.0.1>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>

At 11:58 AM 04/24/2001 -0700, Robin Lee Powell wrote:
>Holy sweet potato, was that a long post.
>
>lojbab, it looks like you missed the discussion with Jay about building
>a dictionary entry site, so I'm going to mostly be addressing that in my
>respones.

I didn't miss it, but was not paying close attention to the details.

>On Tue, Apr 24, 2001 at 04:52:26AM -0400, Bob LeChevalier-Logical Language 
>Group wrote:
> > The level 1 package has traditionally been a set of wordlists and the
> > E-BNF. We pretty much have all the pieces needed to edit these lists into
> > another book, which will be a "pocket dictionary" of Lojban. We aren't
> > seeking a lot of new material (no more lujvo),
>
>Aaaaawwwww....
>
>_Now_ he tells us. 8)

That's for the pocket dictionary, which has to have some limits or it won't 
fit in a pocket. The unabridged dictionary, not to mention any on-line 
version, can still be added to, as I think I noted.

> > though I want to do something to improve the cmavo list. Depending on
> > time and money, this could come out by the end of this year.
>
><nod>

But this really requires that we minimize new stuff for the pocket dictionary.

> > The LogFest members meeting will decide whether the pocket dictionary and
> > the intro lessons will constitute the baseline dictionary and textbook,
> > starting the infamous "5 year freeze period", or whether we should wait
> > until we publish a full dictionary and more thorough textbook, which
> > projects may start moving along once these other books are done. I will
> > admit to wanting to have the full package for the 5 year period myself, 
> but
> > it is not solely my decision. You should speak up before LogFest.
>
>I want to wait, not least because I'd rather not start the freeze until
>we have at least one substantial (read: book-length) translation
>finished, and more fluent people.
>
>How many known fluents do we have, anyways?

The only known ones are Nick and Goran. Several others can carry on a 
halting conversation including my wife and me (and I don't use a wordlist 
when I do so, so I am just one step short of fluency, but have never made 
that last step because I cannot stop myself from translating rather than 
trying to think in Lojban.

xod has promised to be skilled enough to speak Lojban only during the 
entirety of the next LogFest, and Mark Shoulson has taken that as a 
challenge to him to do the same. Both have enough skill to be fluent given 
a solid weekend of live practice.

> > A major hold up on the dictionary has been deciding what to do about 
> cmavo,
> > and this also affects the pocket dictionary. The existing cmavo list does
> > not actually define most of the words, and the keywords were designed for
> > LogFlash, not as proper definitions. We thus have three tasks, and I am
> > willing to farm these out to volunteers in whole or in part.
>
><nod>
>
>All of these tasks can be done by many people using the stuff Jay is
>working on.

Possibly, but consistency is going to be a problem. If the work is not 
done consistently on the cmavo list, it will probably have to be redone (as 
you acknowledge below). What we described for lujvo allows for differing 
levels of quality in the definitions, but I don't want to have to deal with 
that with the other word classes.

>Something Jay and I have not discussed, but that I think would make this
>stuff go _much_ faster, is to add in a voting box (probably labelled
>"Good As Is", so that if you are presented with a word that looks like
>it's just fine, you just check the box and hit submit. If you make an
>actual changes, the box would be ignored.
>
>This would allow 2 things:
>
>1. You (lojbab, nora, &c) could skim briefly all of the words with high
>confidence (more than 3 people think they're good as is, for example).
>
>2. The presentation of entries could be weighted towards showing
>entries that have never recieved a 'good as is', thus concentrating
>people's work on the relevant stuff.

This all sounds very good for the lujvo effort. I'm less sure it would 
help for the cmavo effort, in part because I'm skeptical that the 
volunteers will show up in force.

> > For the Lojban-English side, we need for each cmavo and compound, English
> > keywords if appropriate, a short definition if the word is definable, the
> > selma'o (already there), and a pointer to the reference grammar section(s)
> > discussing the word or the selma'o. Even a Lojban beginner with a copy of
> > the book can work on the latter (and you might learn a lot about the
> > language in the process), since it mostly involves looking stuff up. But
> > don't volunteer unless you think you can commit enough time to do most or
> > all of the either the lookup or the definition task within the next few
> > months on your own - we can't afford a coordinator, and even the CVS 
> option
> > that Robin is working on seems inappropriate for this because consistency
> > in style and look-up strategy/coverage is important and editing that sort
> > of thing done by several people might take as long as doing it ourselves.
>
>I _absolutely_ agree with you: CVS is bad for this. But Jay's form
>forces you to do things in a consistent fashion that can be turned into
>_any_ form you like. I think it's perfect for this and, IMO, the work
>is sufficiently boring that if you require a small number of
>highly-committed volunteers, it'll never get done. Having lots of
>people do a little work is the way to go here, IMO.

I'm willing to try anything at least for a little, but I'll remain 
skeptical till I see some response. The Lojban community has certainly 
grown more active of late, but we have a history of small scale volunteer 
efforts never accomplishing anything.

> > The second task is to go through the rather large accumulation of cmavo
> > compounds that have actually been used and decide which of them have a
> > simple English definition, and prepare them as per the above with 
> keywords,
> > definitions and book references.
>
>Assuming that a list of same can be auto-generated, there's no reason
>why Jay's stuff can't be modified to make that work as well.

Fine. But I have to admit that when I do this work, I do it several 
hundred words at a time, and I myself would hate to fill out forms on a 
word by word basis. Maybe that is why I am skeptical - the only way these 
projects have gotten done in the past has been for people to put in hours 
doing chunks of a couple hundred words at a shot.

> > The third task will be to prepare English to Lojban entries for
> > cmavo. Part of this job will use the results of the above tasks - using
> > keyword processing as we used for the gismu list, and formatting the
> > resulting entries to look like the others.
>
><nod>
>
>Sounds like we save that for later, but again, there's no reason, IMO,
>not to use a web-based form.

We'll probably use automated techniques for this. Again, too many words to 
do one at a time.

>Again, once the word list is compiled, the definitions can be done with
>Jay's stuff.
>
>(BTW, Jay, you da man. 8)
>
> > The lujvo will be the hardest project.
>
>For the record, it's the lujvo that Jay's site was originally designed
>for, although I think we were both under the impression that you wanted
>more of them, which doesn't appear to be the case.

I have always been on record that what we want is to see the words that 
actually get used in the language defined. We do need some more semantic 
coverage in some areas, but it is premature to figure what areas these are 
when we can't determine what words we already have.

>The new searchable
> > archive of Lojban List is proving extremely helpful in making the
> > context searches, and I am hoping that the yahoogroups archives can be
> > added in to that archive somehow to make things even easier for newly
> > made words.
>
>Shouldn't be hard; wget will grab all of the yahoogroups archives
>trivially.

If you can grab and expand the text archives on lojban.org into the mix, 
we'll have 95%+ of the stuff that the words were extracted from.

> > The hard task will of course be place structures.
>
>And it's for that that Jay's stuff was originally proposed.

I'll look forward to seeing results.

> For the remaining place structures, Nick feels that we need to abandon our
> > attempt at perfection and careful analysis for each word in the
> > dictionary. Instead, we should have a series of code symbols or font
> > coding to indicate the level of confidence that we have in the place
> > structure, and also to include a code for the place structure writer, who
> > will have more or less credibility based on his perceived knowledge of the
> > language and amount of lujvo analysis done. We also will not try to have
> > complete place structures for every word we put in the dictionary, using
> > the symbol codes to show the level of incompleteness.
>
>I think all of this can be done automagically with the checkbox I
>proposed.

OK.

> Because this task has languished so long and because the potential is high
> > for getting lots of good stuff going, I would like at least two people to
> > volunteer to work on this either together or independently (the inform 
> work
> > can easily be done independently of any attempt to fix the Basic program).
>
>I, myself, have no interest in doing this because adventure games annoy
>the heck out of me (the puzzle solving just pisses me off).
>
>I am, however, seriously considering creating a lojbanic MUD which, IMO,
>serves all the purposes of making a lojbanic Adventure clone (and can
>certainly have puzzles incorporated into it) except that it's
>multi-user, so it provides for interaction as well.
>
>How much interest do the Powers That Be have in this? It's going to
>take a _lot_ of work, and I don't know that I want to get into it if
>people don't see it as valuable.

I have one volunteer already. The advantage of Colossal Cave is that most 
people know how to solve the puzzles if they have ever played an adventure 
game, it being the original model. Which makes it an exercise in reading 
the Lojban so you know the (lojban) command appropriate to solving the 
puzzle that you already know how to solve.

The main reasons it is "important" is that we don't have this sort of thing 
(your MUD effort will be welcome, in other words) coupled with the fact 
that Nick translated the text files 8 years ago, an enormous 107K of Lojban 
text from the time when people were still struggling to write sentences, 
and they have yet to see the light of day (the text is hidden on the text 
archive, but I doubt that anyone has seen it there amidst the megabytes of 
other stuff). Some of Nick's best Lojban work and the largest text ever 
translated has thus never been read.

lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org