Received: from spooler by stryx.demon.co.uk (Mercury/32 v2.01); 21 Oct 98 22:11:36 +0000 Return-path: Received: from punt-21.mail.demon.net (194.217.242.6) by stryx.demon.co.uk (Mercury/32 v2.01); 21 Oct 98 22:11:28 +0000 Received: from punt-2.mail.demon.net by mailstore for ia@stryx.demon.co.uk id 908961346:20:29467:32; Wed, 21 Oct 98 09:15:46 GMT Received: from listserv.cuny.edu ([128.228.100.10]) by punt-2.mail.demon.net id aa2029323; 21 Oct 98 9:15 GMT Received: from listserv (listserv.cuny.edu) by listserv.cuny.edu (LSMTP for Windows NT v1.1b) with SMTP id <2.00090F7C@listserv.cuny.edu>; Wed, 21 Oct 1998 5:17:17 -0400 Date: Wed, 21 Oct 1998 05:11:29 -0400 Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group Subject: Lojban word lists, etc. X-To: lojban@cuvmb.cc.columbia.edu To: Multiple recipients of list LOJBAN Message-ID: <908961324.2029323.0@listserv.cuny.edu> X-PMFLAGS: 33554560 7 1 Y07FCA.CNM Content-Length: 4805 Lines: 92 >On Tue, 20 Oct 1998, C.D. Wright wrote: > >> Sorry about the multiple posts in quick succession, but before >> I get even more emails telling me about the gismu list at ... >> >> ftp://ftp.access.digex.net/pub/access/lojbab/wordlists/gismu >> >> Let me tell you why it's next to useless for me. >>... >> I need/want an English/lojban dictionary that doesn't give just the >> entries that have close equivalents, but gives near matches as well. >> That's what I'm trying to compile. > >Have you seen engdict.gis? It's a dictionary (English->Lojban) >of sorts. I will second this suggestion, since it is the one I would have made. It is, by the way, not merely a "dictionary of sorts", but the unedited draft of the dictionary we are eventually going to publish. There is some weeding of entries that are in there in multiple forms due to merging multiple files. The list is also missing a few dozen English words that grep in the gismu list to dozens of lines - those entries take a lot more weeding and reformatiing to but into proper dictionary form. And we have not yet dealt with the cmavo list, as well as the ever growing number of new lujvo that crop up in text. But what is in that file is the core of the dictionary-to-come in roughly the entry style that we expect the dictionary to use (the lines will of course be formatted so as to be page-readable). Now what it sounds like Clolin Wright might want, is to take all lines formatted as: > *like (apparent similarity), x1 seems/appears to have > property(ies) x2 to observer x3 under conditions > x4 /:/ [also: x1 seems | it has x2 to x3; suggest > belief/observation (= mlugau, mluti'i); looks > |/resembles (= smimlu, mitmlu)] /=/ simlu (mlu) and eliminate the text between the first comma (separating the entry word and clarification of semantics) and the /=/ that indicates the Lojban word follows. That will give a much smaller greppable file that will also include words that are present only in the semantic clarification, which may provide a few synonyms that are not actual entries. A second-level approach would be to connect an English language theasurus to this list, and write a front end that would first look for the word among the keywords, then look among the semantic clarification in parens, and then look in the thesausrus for synonyms, which should then be looked up only at the keyword level. There are public domain thesaurii on the net (see my mention a few days ago about the Gutenburg edition of Roget) to serve as a basis for this. I will propose that Colin Wright or some ambitious programmer with time on his hands volunteer to put together such a front end which would essentially provide all of us with an on-line dictionary. (An obviously addition to such an effort woulddisplay all entries found by the search described above, allow the user to select one, and then retrieve the full definition line so that the place structure is available. needless to say this is a project that could expand through enhancements into a major undertaking. John Cowan started 6 or 7 years ago to provide me with examples for each cmavo in the cmavo list and ended up writing the reference grammar %^). There is one entry format that gfoot left out in his samples, by the way. When an entry word refers to an oblique place of the Lojban word, that place is noted in the beginning of the line. *actors, x5 of: x1 is a drama/play about x2 [plot/theme/subject] by dramatist x3 for audience x4 with | x5 /:/ [x2 may also be a convention] /=/ draci The "x5 of:" is something that should not be cut out of a pared-down file which would look something like *actors, x5 of:/=/ draci so please amend by description of the paring down process above accordingly - you cannot always stop at the first comma. But I suspect that this amount of sophistication isn't hard to do with awk for people who know that language. lojbab ---- lojbab lojbab@access.digex.net Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 Artificial language Loglan/Lojban: ftp.access.digex.net /pub/access/lojbab or see Lojban WWW Server: href="http://xiron.pc.helsinki.fi/lojban/" Order _The Complete Lojban Language_ - see our Web pages or ask me.