Received: from spooler by stryx.demon.co.uk (Mercury/32 v2.01); 28 Sep 98 00:45:14 +0000 Return-path: Received: from punt-11.mail.demon.net (194.217.242.34) by stryx.demon.co.uk (Mercury/32 v2.01); 28 Sep 98 00:45:08 +0000 Received: from punt-1.mail.demon.net by mailstore for ia@stryx.demon.co.uk id 906624134:10:00671:4; Thu, 24 Sep 98 08:02:14 GMT Received: from listserv.cuny.edu ([128.228.100.10]) by punt-1.mail.demon.net id aa1000637; 24 Sep 98 8:02 GMT Received: from listserv (listserv.cuny.edu) by listserv.cuny.edu (LSMTP for Windows NT v1.1b) with SMTP id <1.FE9D2E1A@listserv.cuny.edu>; Thu, 24 Sep 1998 4:03:31 -0400 Date: Thu, 24 Sep 1998 04:00:57 -0400 Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group Subject: word frequency list coming X-To: lojban@cuvmb.cc.columbia.edu To: Multiple recipients of list LOJBAN Message-ID: <906624130.10637.0@listserv.cuny.edu> X-PMFLAGS: 33554560 7 1 Y053AB.CNM Content-Length: 1571 Lines: 32 Well I took a look around the Web and found a word-frequency/concordance program. I have run it on the entirety of my Lojban text archive (which unbfortunately has some reptitions in it because of commentaries, quoted text and revisions) and am working on filtering out all the garbage. This is all Lojban text that I have up till 10/94, because my mail processing is that far backlogged, that I haven't extracted the Lojban text from my logs since then (takes me around 1-2 hours per month, so don't hold your breath %^) When I get done weeding it, I will make it available on the FTP site. Probably within a week. An advantage over previous efforts of this sort, is that if the concordance function works, then we can check some of the questionable entries to see the context, since the program maintains a keyword in context database (I think). Previously, good English words like "simple" are detected as Lojban words (since they are valid ones), and without the abvility to check context, we have had no way to find out if the word was actually used in LOJBAN. Of course most of the usage of lujvo that Jorge is interested in dates from after 10/94 (but then he has done most of such writing anyway %^). lojbab