Return-Path: <@FINHUTC.HUT.FI:LOJBAN@CUVMB.BITNET> Received: from FINHUTC.hut.fi by xiron.pc.helsinki.fi with smtp (Linux Smail3.1.28.1 #1) id m0qRYjI-000023C; Sat, 23 Jul 94 07:30 EET DST Message-Id: Received: from FINHUTC.HUT.FI by FINHUTC.hut.fi (IBM VM SMTP V2R2) with BSMTP id 5964; Sat, 23 Jul 94 07:29:05 EET Received: from SEARN.SUNET.SE (NJE origin MAILER@SEARN) by FINHUTC.HUT.FI (LMail V1.1d/1.7f) with BSMTP id 5963; Sat, 23 Jul 1994 07:29:05 +0200 Received: from SEARN.SUNET.SE (NJE origin LISTSERV@SEARN) by SEARN.SUNET.SE (LMail V1.2a/1.8a) with BSMTP id 9084; Sat, 23 Jul 1994 06:28:15 +0200 Date: Sat, 23 Jul 1994 00:29:59 -0400 Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group Subject: Re: lojban thesaurus query X-To: WAUGH@ACM.ORG X-cc: lojban@cuvmb.cc.columbia.edu To: Veijo Vilva Content-Length: 1851 Lines: 34 JW> That's what the English-Lojban section of the dictionary JW> is going to be, isn't it? In effect a cross reference that JW> catches every use of an English word in describing a place JW> in a Lojban predicate, and lists them by English word. Indeed. Although we are editing the file generated by the automatic routines that produced the current files. One would not want 1491 entries for "is", (rather useless), nor the slightly more useful "438" entries for "in" (but can't drop them all - we need nenri among others), 166 for 'with' (kansa), 107 for 'material' (most are probably somewhat relevant, but a combined entry would be worth more than a 107 line repetittion of complete place structures as the auto-process would generate), or 52 for 'reflects' (now we start getting into tougher judgement calls). I am right now editing the first English keyword file, with 10000 lines/entries corresponding to all usages with no nore than 20 occurances in the auto-process file. The more voluminous other entries will be weeded vigorously and combined as needed to keep them short enough to be used. Since LogFest (when I started) I have doe around 1840 lines of 9500 total, and am doing more than 500 lines per day and accelrating as I get used to my system. I am hoping to have this first list up on the ftp site for review/usage by the end of the month, and it will probably be close to its current (1.9 Meg) length. The cmavo list is somewhat smaller (600K raw) and the lujvo list E-entries very large (4.4Meg unedited covering some 3000 Lojban lujvo, this will likely be cut to 2-3Meg). I'll let this serve as an ad hoc dictionary progress report, and also let people know the scope of the dictionary, by the size of these files that we are playing with. Should be a Good Thing, from the way it is shaping up so far .o'acai lojbab