[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Lojban wordlists



At 10:41 PM 5/20/03 +0200, Arne Koewing wrote:
As I'm new to this list let me introduce myself:
I study computer science at the university of Oldenburg
in north Germany (so I believe my English isn't best ;-),
I've just started learning lojban and in parallel i will try out
lojban as an knowledge representation language (part of).
I want to write a parser for the wordlists as a first step.

Does anybody has a description of the lojban wordlists ?
(gismu.txt,...)

my observations for gismu.txt:
+-----------+-----+------+
|type       |start|length|
+-----------+-----+------+
|gismu      |0    |5     |
+-----------+-----+------+
|rafsi1     |6    |3     |
+-----------+-----+------+
|rafsi2     |10   |3     |
+-----------+-----+------+
|rafsi3     |14   |3-4   |
+-----------+-----+------+
|english    |19   |      |
+-----------+-----+------+
|"clue"     |40   |      |
+-----------+-----+------+
|description|61   |      |
+-----------+-----+------+
|???        |158  |2     |

This pertained to the original Lojban textbook, and was the lesson number and subgroup in which that word was to be introduced. Words were assigned to lessons 1-9 and a lesson subgroup based on semantics. Unassigned words were in lesson "a" with or without a semantic subgrouping. Cowan's rewrite of the first 6 lessons of the draft textbook into 22, which are found on www.lojban.org, eliminated the tracking of words to lesson, but the gismu list was never republished after that point.

+-----------+-----+------+
|???        |160  |4     |

This number is the frequency of usage of the word in my then-corpus (1991 or 1992, I think), which included English language texts about Lojban, so that some words like gismu and sumti were very high. the idea was that people might want to learn the words that they were more likely to run into in discussion or usage of Lojban.

Both of these were columnated to allow sorting on those columns, possibly for use in LogFlash, but also presumed to be useful for other purposes.


+-----------+-----+------+
|info...    |168  |      |
+-----------+-----+------+

Included in this info are all the "cf's" which are crosslinks from a Roget-like analysis of semantic groupings of the gismu, I think originally done by Veijo Vilva, though I modified it heavily.


--
lojbab                                             lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                    703-385-0273
Artificial language Loglan/Lojban:                 http://www.lojban.org