[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lojban word lists, etc.




>On Tue, 20 Oct 1998, C.D. Wright wrote:
>
>> Sorry about the multiple posts in quick succession, but before
>> I get even more emails telling me about the gismu list at ...
>>
>>     ftp://ftp.access.digex.net/pub/access/lojbab/wordlists/gismu
>>
>> Let me tell you why it's next to useless for me.
>>...

>> I need/want an English/lojban dictionary that doesn't give just the
>> entries that have close equivalents, but gives near matches as well.
>> That's what I'm trying to compile.
>
>Have you seen engdict.gis?  It's a dictionary (English->Lojban)
>of sorts.

I will second this suggestion, since it is the one I would have made.  It is,
by the way, not merely a "dictionary of sorts", but the unedited draft of
the dictionary we are eventually going to publish.  There is some weeding of
entries that are in there in multiple forms due to merging multiple files.
The list is also missing a few dozen English words that grep in the gismu list
to dozens of lines - those entries take a lot more weeding and reformatiing
to but into proper dictionary form.  And we have not yet dealt with the
cmavo list, as well as the ever growing number of new lujvo that crop up in
text.  But what is in that file is the core of the dictionary-to-come in
roughly the entry style that we expect the dictionary to use (the lines will
of course be formatted so as to be page-readable).

Now what it sounds like Clolin Wright might want, is to take all lines
formatted as:
>    *like (apparent similarity), x1 seems/appears to have
>          property(ies) x2 to observer x3 under conditions
>          x4 /:/ [also: x1 seems | it has x2 to x3; suggest
>          belief/observation (= mlugau, mluti'i); looks
>          |/resembles (= smimlu, mitmlu)] /=/ simlu (mlu)

and eliminate the text between the first comma (separating the entry word
and clarification of semantics) and the /=/ that indicates the Lojban word
follows.  That will give  a much smaller greppable file that will also
include words that are present only in the semantic clarification, which may
provide a few synonyms that are not actual entries.

A second-level approach would be to connect an English language theasurus to
this list, and write a front end that would first look for the word among
the keywords, then look among the semantic clarification in parens, and then
look in the thesausrus for synonyms, which should then be looked up only
at the keyword level.  There are public domain thesaurii on the net (see
my mention a few days ago about the Gutenburg edition of Roget) to serve as
a basis for this.  I will propose that Colin Wright or some ambitious
programmer with time on his hands volunteer to put together such a front end
which would essentially provide all of us with an on-line dictionary.  (An
obviously addition to such an effort woulddisplay all entries found by the
search described above, allow the user to select one, and then retrieve the
full definition line so that the place structure is available.  needless to
say this is a project that could expand through enhancements into a major
undertaking. John Cowan started  6 or 7 years ago to provide me with examples
 for each cmavo in the cmavo list and ended up writing the reference grammar
 %^).

There is one entry format that gfoot left out in his samples, by the way.
When an entry word refers to an oblique place of the Lojban word, that place is
noted in the beginning of the line.

*actors, x5 of:  x1 is a drama/play about x2 [plot/theme/subject] by
dramatist x3 for audience x4 with | x5 /:/ [x2 may also be a convention]
/=/ draci

The "x5 of:" is something that should not be cut out of a pared-down file
which would look something like
*actors, x5 of:/=/ draci

so please amend by description of the paring down process above accordingly
- you cannot always stop at the first comma.  But I suspect that this amount
of sophistication isn't hard to do with awk for people who know that language.

lojbab
----
lojbab                                                lojbab@access.digex.net
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                        703-385-0273
Artificial language Loglan/Lojban: ftp.access.digex.net /pub/access/lojbab
    or see Lojban WWW Server: href="http://xiron.pc.helsinki.fi/lojban/";
    Order _The Complete Lojban Language_ - see our Web pages or ask me.