[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: Lojban wordlists
At 10:41 PM 5/20/03 +0200, Arne Koewing wrote:
As I'm new to this list let me introduce myself:
I study computer science at the university of Oldenburg
in north Germany (so I believe my English isn't best ;-),
I've just started learning lojban and in parallel i will try out
lojban as an knowledge representation language (part of).
I want to write a parser for the wordlists as a first step.
Does anybody has a description of the lojban wordlists ?
(gismu.txt,...)
my observations for gismu.txt:
+-----------+-----+------+
|type |start|length|
+-----------+-----+------+
|gismu |0 |5 |
+-----------+-----+------+
|rafsi1 |6 |3 |
+-----------+-----+------+
|rafsi2 |10 |3 |
+-----------+-----+------+
|rafsi3 |14 |3-4 |
+-----------+-----+------+
|english |19 | |
+-----------+-----+------+
|"clue" |40 | |
+-----------+-----+------+
|description|61 | |
+-----------+-----+------+
|??? |158 |2 |
This pertained to the original Lojban textbook, and was the lesson number
and subgroup in which that word was to be introduced. Words were assigned
to lessons 1-9 and a lesson subgroup based on semantics. Unassigned words
were in lesson "a" with or without a semantic subgrouping. Cowan's rewrite
of the first 6 lessons of the draft textbook into 22, which are found on
www.lojban.org, eliminated the tracking of words to lesson, but the gismu
list was never republished after that point.
+-----------+-----+------+
|??? |160 |4 |
This number is the frequency of usage of the word in my then-corpus (1991
or 1992, I think), which included English language texts about Lojban, so
that some words like gismu and sumti were very high. the idea was that
people might want to learn the words that they were more likely to run into
in discussion or usage of Lojban.
Both of these were columnated to allow sorting on those columns, possibly
for use in LogFlash, but also presumed to be useful for other purposes.
+-----------+-----+------+
|info... |168 | |
+-----------+-----+------+
Included in this info are all the "cf's" which are crosslinks from a
Roget-like analysis of semantic groupings of the gismu, I think originally
done by Veijo Vilva, though I modified it heavily.
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org