From lojbab@lojban.org Fri Aug 30 07:04:13 2002
Return-Path: <lojbab@lojban.org>
X-Sender: lojbab@lojban.org
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-8_1_0_1); 30 Aug 2002 14:04:13 -0000
Received: (qmail 28164 invoked from network); 30 Aug 2002 14:04:12 -0000
Received: from unknown (66.218.66.216)
  by m1.grp.scd.yahoo.com with QMQP; 30 Aug 2002 14:04:12 -0000
Received: from unknown (HELO lakemtao02.cox.net) (68.1.17.243)
  by mta1.grp.scd.yahoo.com with SMTP; 30 Aug 2002 14:04:09 -0000
Received: from lojban.lojban.org ([68.100.206.153]) by lakemtao02.cox.net
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
          id <20020830140408.PKSM12192.lakemtao02.cox.net@lojban.lojban.org>
          for <lojban@yahoogroups.com>; Fri, 30 Aug 2002 10:04:08 -0400
Message-Id: <5.1.0.14.0.20020830094058.0331a8f0@pop.east.cox.net>
X-Sender: lojbab@pop.east.cox.net (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Fri, 30 Aug 2002 09:57:00 -0400
To: lojban@yahoogroups.com
Subject: dictionary - which words?
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
From: Bob LeChevalier-Logical Language Group <lojbab@lojban.org>
X-Yahoo-Group-Post: member; u=1120595
X-Yahoo-Profile: lojbab
X-Yahoo-Message-Num: 15299

I have been doing some thinking about the dictionary, and what words it 
should contain (I'm especially thinking is terms of lujvo with place 
structures, fu'ivla, and cmavo compounds).  My historical position has been 
that most-used words should be documented.  But there has been much debate 
about my methods for gathering data on usage, and other people using 
different methods get different data.

 From a marketing standpoint, a different standard is preferable.  At some 
early point, we will want to put out a Lojban chrestomathy or some other 
collection of definitive Lojban texts of varying levels of difficulty, and 
of varying style.  In the way people tackle such things, they will expect 
to find the words that they don't know in the dictionary.  Should we then 
not be identifying some of our key writings likely to be included in an 
early chrestomathy, and making sure that ALL of the words used therein are 
well-defined in the dictionary.  If they aren't, won't some people trying 
to use a chrestomathy to study the language, be turned off by the fact that 
so many words are not in the dictionary?

If people want to use Jay's tool for dictionary entry, I presume that 
someone could write code that would decompose entire Lojban texts into 
unique word lists and usage data into a form suitable for loading into his 
data base, perhaps retaining the first 1 or 2 sentences it finds as example 
texts to be co-loaded (and to provide people working on the dictionary with 
a clue as to what the word means when it isn't obvious).

Of course the problem is still that we need people with the stamina to 
tackle lists of a ten or a hundred hundred words at a time, figure out what 
they mean, and write place structures for them, until all are done.  That 
remains the biggest volunteer task sitting untouched, and it is one which 
people can undertake at any scale they wish from 1 word a day to a hundred 
at a time.

lojbab

-- 
lojbab                                             lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                    703-385-0273
Artificial language Loglan/Lojban:                 http://www.lojban.org