From lojbab@lojban.org Mon Sep 25 13:10:28 2000
Return-Path: <lojbab@lojban.org>
X-Sender: lojbab@lojban.org
X-Apparently-To: lojban@egroups.com
Received: (EGP: mail-6_0_2); 25 Sep 2000 20:10:28 -0000
Received: (qmail 30747 invoked from network); 25 Sep 2000 20:10:28 -0000
Received: from unknown (10.1.10.27) by m4.onelist.org with QMQP; 25 Sep 2000 20:10:28 -0000
Received: from unknown (HELO stmpy-2.cais.net) (205.252.14.72) by mta2 with SMTP; 25 Sep 2000 20:10:28 -0000
Received: from bob (dynamic229.cl8.cais.net [205.177.20.229]) by stmpy-2.cais.net (8.10.1/8.9.3) with ESMTP id e8PKAQC29851 for <lojban@egroups.com>; Mon, 25 Sep 2000 16:10:26 -0400 (EDT) (envelope-from lojbab@lojban.org)
Message-Id: <4.2.2.20000925152433.00afda00@127.0.0.1>
X-Sender: vir1036/pop.cais.com@127.0.0.1
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Mon, 25 Sep 2000 16:06:42 -0400
To: lojban@egroups.com
Subject: Re: [lojban] Volunteering for dictionary work
In-Reply-To: <3.0.5.32.20000925123823.0096b1e0@pop.stud.ntnu.no>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>

At 12:38 PM 09/25/2000 +0200, Arnt Richard Johansen wrote:
>I've looked through
><http://www.lojban.org/files/draft-dictionary/Working/>, and I'm
>considering volunteering for preparing lujvo for the dictionary. I have a
>few questions though, as to what needs to be done, and how.
>
>1. Is the task to write keywords and place structures of new lujvo,

I'd like to start with keywords for all the words, with place structures 
slightly less priority but still desired. People might be unsure of how to 
do the place structures (which takes experience in order to do with 
confidence, and even then might have problems given that we have done 
little cross-checking of place structures by different authors to see if we 
are doing them consistently). If we have keywords then we can semantically 
group similar concepts which will help in that place structure checking as 
well as allow us to decide which words are worth including in the dictionary.

> in the
>same format as the current computerized lujvo list
>(http://www.lojban.org/files/draft-dictionary/NORALUJV.txt)?

Yes. The closer you come to the current format, the more automated will be 
the process of putting it in some other form later if needed.

>Or should all
>lujvo, the new ones as well as the one already in the list, be written in a
>new format, specifically for the paper dictionary?

We don't know what such a format would be. I think that people would rather 
see a dictionary come out sooner with consistent definitional forms that 
take a little decoding rather than have us delay a long while in order to 
have very English-idiomatic definitional forms. I would rather include 
more words defined accurately but less prettily, rather than fewer words 
with optimal definitions. With people coining new lujvo at a rate much 
faster than we can define them, speed in getting a good look-up dictionary 
to help people find a word if it has already been coined would be a 
blessing (it would also greatly enhance glossers to have glosses for a 
number of lujvo, which requires keywording more than place structures).

>One would think that
>lujvo definitions should be written out in full (as is done in the
>computerized gismu list), instead of summarily referring to gismu places.

The computerized gismu list took years and many review passes to get where 
it is today. It was an incredibly time consuming process, and there are 
already several times as many lujvo proposals as there are gismu.

>For instance, in the lujvo list, "cabdei" occurs like this:
>
> cabdei cabna+djedi: today: x1 = djedi1 
> (full
>day) = cabna1 (now), x2 = cabna2 (co-occurred with), x3 = djedi3 (full day
>standard)
>
>But shouldn't it be changed to look like this in an "ordinary" dictionary:
>
> cabdei cabna+djedi today x1 is the day that is 
> simultaneous with x2, by
>standard x3

We might adopt the policy of rewriting those lujvo that exceed a certain 
threshold of usage (cabdei would be a likely candidate, as would brivla), 
but we aren't ready to decide.

Ideally, I would like the coding that the Book uses in presenting place 
structures as analyzed (which we can process automatically into the form in 
the lujvo list if it is done in a consistent format). See Nick's lujvo 
list to find a mass of words in the brief coded form. This makes it easier 
to check what someone else has done. The second form you present has lost 
the analysis information, thus requiring someone checking you work to look 
up the place structures of the source gismu and perform the analysis 
independently without your work as a clue, in order to check to see if s/he 
agrees with what you have come up with. And that checking will have to be 
done at least a couple of times before we put the word in the 
dictionary. So save the pretty wording for later (if ever).

>2. How should we find out (ma ve djuno) the meanings of the lujvo that
>haven't been defined yet?

If you KNOW what it means, as in this case because you used it, say what 
your intent/understanding was in using it, and feel free to note in what 
you submit that you actually used the word that way. How a word has 
actually been used is more valuable a guideline than the analytical opinion 
of someone who is doing a chunk of 100 words that he never saw before he 
looked at the lujvo list. You might have made some mistakes in your 
coinings, but then by your annotation I would expect a higher standard of 
argument to justify a different meaning than you intended.

> As an example, take the word "vlatai", which has
>occurred relatively often in the text corpus (34 times). I distinctly
>remember using that particular word in a conversation with Jorge on the
>list, intending it to mean "x1 is an inflected form of word/lexeme x2,
>yielding meaning x3".

Then put that down with a note saying that this was your intent when using 
it, with keyword "inflected form". Later place structure analysis may come 
up with a different result, but if you used it a certain way, then that 
should guide the place structure analysis.

> Now, since I only have the eGroups archives handy,
>it is difficult for me to find enough usage of it, so that I can be sure
>that my interpretation of the word is indeed the most correct.

Correctness is a relative thing when we as yet have no standard (the point 
is to make a standard). I am not expecting everyone to do an archive 
search for each word. For keyword analysis, I would be happy to have a 
best guess for all the words. Then people can look at others' proposed 
keywords and see if they agree. We can do an archive search later for the 
words for which there is some uncertainty (and there is enough usage that 
we are likely to be able to have usage resolve the issue).

In any event it will be a multi-pass analysis. Nora has already found that 
it is impossible to maintain consistency over an analysis of even 1000 
words, and we are getting closer to 10,000. So I want to build multiple 
passes by multiple people into the approach to defining the words, so as to 
catch the most consistency errors possible with the least effort.

If you do 50 words superbly, you are unlikely to notice any consistency 
errors. If you do 500 words in multiple passes that take less time on each 
word as you go, you will end up correcting yourself sometimes on a later 
pass, but you will feel more productive and your result will be far more 
useful. And if you have to quit after doing a large chink of words 
partially, someone else can take over and do the next step, performing a 
consistency check as THEY go.

lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org