From lojbab@lojban.org Fri Aug 30 07:04:13 2002 Return-Path: X-Sender: lojbab@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_1_0_1); 30 Aug 2002 14:04:13 -0000 Received: (qmail 28164 invoked from network); 30 Aug 2002 14:04:12 -0000 Received: from unknown (66.218.66.216) by m1.grp.scd.yahoo.com with QMQP; 30 Aug 2002 14:04:12 -0000 Received: from unknown (HELO lakemtao02.cox.net) (68.1.17.243) by mta1.grp.scd.yahoo.com with SMTP; 30 Aug 2002 14:04:09 -0000 Received: from lojban.lojban.org ([68.100.206.153]) by lakemtao02.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20020830140408.PKSM12192.lakemtao02.cox.net@lojban.lojban.org> for ; Fri, 30 Aug 2002 10:04:08 -0400 Message-Id: <5.1.0.14.0.20020830094058.0331a8f0@pop.east.cox.net> X-Sender: lojbab@pop.east.cox.net (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Fri, 30 Aug 2002 09:57:00 -0400 To: lojban@yahoogroups.com Subject: dictionary - which words? Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed From: Bob LeChevalier-Logical Language Group X-Yahoo-Group-Post: member; u=1120595 X-Yahoo-Profile: lojbab X-Yahoo-Message-Num: 15299 I have been doing some thinking about the dictionary, and what words it should contain (I'm especially thinking is terms of lujvo with place structures, fu'ivla, and cmavo compounds). My historical position has been that most-used words should be documented. But there has been much debate about my methods for gathering data on usage, and other people using different methods get different data. From a marketing standpoint, a different standard is preferable. At some early point, we will want to put out a Lojban chrestomathy or some other collection of definitive Lojban texts of varying levels of difficulty, and of varying style. In the way people tackle such things, they will expect to find the words that they don't know in the dictionary. Should we then not be identifying some of our key writings likely to be included in an early chrestomathy, and making sure that ALL of the words used therein are well-defined in the dictionary. If they aren't, won't some people trying to use a chrestomathy to study the language, be turned off by the fact that so many words are not in the dictionary? If people want to use Jay's tool for dictionary entry, I presume that someone could write code that would decompose entire Lojban texts into unique word lists and usage data into a form suitable for loading into his data base, perhaps retaining the first 1 or 2 sentences it finds as example texts to be co-loaded (and to provide people working on the dictionary with a clue as to what the word means when it isn't obvious). Of course the problem is still that we need people with the stamina to tackle lists of a ten or a hundred hundred words at a time, figure out what they mean, and write place structures for them, until all are done. That remains the biggest volunteer task sitting untouched, and it is one which people can undertake at any scale they wish from 1 word a day to a hundred at a time. lojbab -- lojbab lojbab@lojban.org Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 Artificial language Loglan/Lojban: http://www.lojban.org