[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] Volunteering for dictionary work
At 12:38 PM 09/25/2000 +0200, Arnt Richard Johansen wrote:
I've looked through
<http://www.lojban.org/files/draft-dictionary/Working/>, and I'm
considering volunteering for preparing lujvo for the dictionary. I have a
few questions though, as to what needs to be done, and how.
1. Is the task to write keywords and place structures of new lujvo,
I'd like to start with keywords for all the words, with place structures
slightly less priority but still desired. People might be unsure of how to
do the place structures (which takes experience in order to do with
confidence, and even then might have problems given that we have done
little cross-checking of place structures by different authors to see if we
are doing them consistently). If we have keywords then we can semantically
group similar concepts which will help in that place structure checking as
well as allow us to decide which words are worth including in the dictionary.
in the
same format as the current computerized lujvo list
(http://www.lojban.org/files/draft-dictionary/NORALUJV.txt)?
Yes. The closer you come to the current format, the more automated will be
the process of putting it in some other form later if needed.
Or should all
lujvo, the new ones as well as the one already in the list, be written in a
new format, specifically for the paper dictionary?
We don't know what such a format would be. I think that people would rather
see a dictionary come out sooner with consistent definitional forms that
take a little decoding rather than have us delay a long while in order to
have very English-idiomatic definitional forms. I would rather include
more words defined accurately but less prettily, rather than fewer words
with optimal definitions. With people coining new lujvo at a rate much
faster than we can define them, speed in getting a good look-up dictionary
to help people find a word if it has already been coined would be a
blessing (it would also greatly enhance glossers to have glosses for a
number of lujvo, which requires keywording more than place structures).
One would think that
lujvo definitions should be written out in full (as is done in the
computerized gismu list), instead of summarily referring to gismu places.
The computerized gismu list took years and many review passes to get where
it is today. It was an incredibly time consuming process, and there are
already several times as many lujvo proposals as there are gismu.
For instance, in the lujvo list, "cabdei" occurs like this:
cabdei cabna+djedi: today: x1 = djedi1
(full
day) = cabna1 (now), x2 = cabna2 (co-occurred with), x3 = djedi3 (full day
standard)
But shouldn't it be changed to look like this in an "ordinary" dictionary:
cabdei cabna+djedi today x1 is the day that is
simultaneous with x2, by
standard x3
We might adopt the policy of rewriting those lujvo that exceed a certain
threshold of usage (cabdei would be a likely candidate, as would brivla),
but we aren't ready to decide.
Ideally, I would like the coding that the Book uses in presenting place
structures as analyzed (which we can process automatically into the form in
the lujvo list if it is done in a consistent format). See Nick's lujvo
list to find a mass of words in the brief coded form. This makes it easier
to check what someone else has done. The second form you present has lost
the analysis information, thus requiring someone checking you work to look
up the place structures of the source gismu and perform the analysis
independently without your work as a clue, in order to check to see if s/he
agrees with what you have come up with. And that checking will have to be
done at least a couple of times before we put the word in the
dictionary. So save the pretty wording for later (if ever).
2. How should we find out (ma ve djuno) the meanings of the lujvo that
haven't been defined yet?
If you KNOW what it means, as in this case because you used it, say what
your intent/understanding was in using it, and feel free to note in what
you submit that you actually used the word that way. How a word has
actually been used is more valuable a guideline than the analytical opinion
of someone who is doing a chunk of 100 words that he never saw before he
looked at the lujvo list. You might have made some mistakes in your
coinings, but then by your annotation I would expect a higher standard of
argument to justify a different meaning than you intended.
As an example, take the word "vlatai", which has
occurred relatively often in the text corpus (34 times). I distinctly
remember using that particular word in a conversation with Jorge on the
list, intending it to mean "x1 is an inflected form of word/lexeme x2,
yielding meaning x3".
Then put that down with a note saying that this was your intent when using
it, with keyword "inflected form". Later place structure analysis may come
up with a different result, but if you used it a certain way, then that
should guide the place structure analysis.
Now, since I only have the eGroups archives handy,
it is difficult for me to find enough usage of it, so that I can be sure
that my interpretation of the word is indeed the most correct.
Correctness is a relative thing when we as yet have no standard (the point
is to make a standard). I am not expecting everyone to do an archive
search for each word. For keyword analysis, I would be happy to have a
best guess for all the words. Then people can look at others' proposed
keywords and see if they agree. We can do an archive search later for the
words for which there is some uncertainty (and there is enough usage that
we are likely to be able to have usage resolve the issue).
In any event it will be a multi-pass analysis. Nora has already found that
it is impossible to maintain consistency over an analysis of even 1000
words, and we are getting closer to 10,000. So I want to build multiple
passes by multiple people into the approach to defining the words, so as to
catch the most consistency errors possible with the least effort.
If you do 50 words superbly, you are unlikely to notice any consistency
errors. If you do 500 words in multiple passes that take less time on each
word as you go, you will end up correcting yourself sometimes on a later
pass, but you will feel more productive and your result will be far more
useful. And if you have to quit after doing a large chink of words
partially, someone else can take over and do the next step, performing a
consistency check as THEY go.
lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org