From sabren@manifestation.com Fri Jul 13 20:11:26 2001
Return-Path: <sabren@manifestation.com>
X-Sender: sabren@manifestation.com
X-Apparently-To: lojban@onelist.com
Received: (EGP: mail-7_2_0); 14 Jul 2001 03:11:25 -0000
Received: (qmail 50609 invoked from network); 14 Jul 2001 03:11:25 -0000
Received: from unknown (10.1.10.26) by l7.egroups.com with QMQP; 14 Jul 2001 03:11:25 -0000
Received: from unknown (HELO mercury.sabren.com) (209.61.186.253) by mta1 with SMTP; 14 Jul 2001 03:11:25 -0000
Received: from localhost (sabren@localhost) by mercury.sabren.com (8.9.3/8.9.3) with ESMTP id XAA29901 for <lojban@onelist.com>; Fri, 13 Jul 2001 23:19:43 -0500
Date: Fri, 13 Jul 2001 23:19:43 -0500 (CDT)
X-Sender: sabren@mercury.sabren.com
To: lojban list <lojban@yahoogroups.com>
Subject: columns 158-164
Message-ID: <Pine.LNX.4.21.0107132253320.29862-100000@mercury.sabren.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Michal Wallace <sabren@manifestation.com>


coi rodo

I'm looking at the gismu list, and notice two columns of codes right
after the english definitions and before the cross references. What
do these mean?

The first one almost looks like some sort of grouping: blanu, xunre,
narju all share the code 1a.. 

The second one.. I thought I heard something about word frequency?
I just wrote a little program to sort the list by that number..
The top comes out like:

('cusku', 'express ', '1h ', '872')
('tanru', 'phrase compoun', '1b ', '776')
('prenu', 'person ', '1k ', '632')
('gismu', 'root word ', '1b ', '554')
('djica', 'desire ', '3l ', '500')
('lujvo', 'affix compound', '1b ', '428')
('diklo', 'local ', '5d ', '426')
('klama', 'come ', '1g1', '399')
('bacru', 'utter ', '1h ', '386')
('djuno', 'know ', '1h ', '375')
('sumti', 'argument ', '1b2', '373')
('drata', 'other ', '2g ', '351')
('kumfa', 'room ', '2k ', '346')
('tavla', 'talk ', '1h ', '338')
('nanmu', 'man ', '1k ', '332')
('cmalu', 'small ', '1e ', '326')
('citka', 'eat ', '5c ', '320')
('barda', 'big ', '1e ', '318')

I find it hard to believe tanru is a more common word than citka or
barda, but these do seem to be "simple" lojban words.. But then again,
the other end came out like:

('gluta', 'glove ', 'ao ', ' 0')
('pambe', 'pump ', 'a ', ' 0')
('kanji', 'calculate ', '7e ', ' 0')
('barja', 'bar ', 'ap ', ' 0')
('sigja', 'cigar ', 'a ', ' 0')
('xatsi', '1E-18 ', 'ae ', ' 0')
('petso', '1E15 ', 'ae ', ' 0')
('fanri', 'factory ', '8c ', ' 0')
('barna', 'mark ', 'a ', ' 0')
('tsina', 'stage ', '5g ', ' 0')

Which definitely seem less common (or more culture-specific).

Am I reading these two codes right? Where did they come from?

Cheers,

- Michal
----------------------------------------------------------------------
let me host you! http://www.sabren.com me: http://www.sabren.net
----------------------------------------------------------------------