From rlpowell@digitalkingdom.org Tue Apr 24 11:58:18 2001 Return-Path: X-Sender: rlpowell@digitalkingdom.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_1_2); 24 Apr 2001 18:58:18 -0000 Received: (qmail 43286 invoked from network); 24 Apr 2001 18:58:17 -0000 Received: from unknown (10.1.10.26) by l9.egroups.com with QMQP; 24 Apr 2001 18:58:17 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (64.169.75.101) by mta1 with SMTP; 24 Apr 2001 18:58:15 -0000 Received: from rlpowell by chain.digitalkingdom.org with local (Exim 3.22 #1 (Debian)) id 14s80s-0001ko-00 for ; Tue, 24 Apr 2001 11:58:14 -0700 Date: Tue, 24 Apr 2001 11:58:14 -0700 To: lojban@yahoogroups.com Subject: Re: [lojban] NickFest 2 Message-ID: <20010424115814.M28300@digitalkingdom.org> Mail-Followup-To: lojban@yahoogroups.com References: <4.3.2.7.2.20010424020703.00bba800@127.0.0.1> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <4.3.2.7.2.20010424020703.00bba800@127.0.0.1>; from lojbab@lojban.org on Tue, Apr 24, 2001 at 04:52:26AM -0400 From: Robin Lee Powell Holy sweet potato, was that a long post. lojbab, it looks like you missed the discussion with Jay about building a dictionary entry site, so I'm going to mostly be addressing that in my respones. On Tue, Apr 24, 2001 at 04:52:26AM -0400, Bob LeChevalier-Logical Language Group wrote: > The level 1 package has traditionally been a set of wordlists and the > E-BNF. We pretty much have all the pieces needed to edit these lists into > another book, which will be a "pocket dictionary" of Lojban. We aren't > seeking a lot of new material (no more lujvo), Aaaaawwwww.... _Now_ he tells us. 8) > though I want to do something to improve the cmavo list. Depending on > time and money, this could come out by the end of this year. > The LogFest members meeting will decide whether the pocket dictionary and > the intro lessons will constitute the baseline dictionary and textbook, > starting the infamous "5 year freeze period", or whether we should wait > until we publish a full dictionary and more thorough textbook, which > projects may start moving along once these other books are done. I will > admit to wanting to have the full package for the 5 year period myself, but > it is not solely my decision. You should speak up before LogFest. I want to wait, not least because I'd rather not start the freeze until we have at least one substantial (read: book-length) translation finished, and more fluent people. How many known fluents do we have, anyways? > A major hold up on the dictionary has been deciding what to do about cmavo, > and this also affects the pocket dictionary. The existing cmavo list does > not actually define most of the words, and the keywords were designed for > LogFlash, not as proper definitions. We thus have three tasks, and I am > willing to farm these out to volunteers in whole or in part. All of these tasks can be done by many people using the stuff Jay is working on. To wit: A simple, web-based front end, which presents you with an entry for whatever word type you've chosen to work on (probably presented semi-randomly, see below). This form would include text fields for all the necessary information for that word type. Something Jay and I have not discussed, but that I think would make this stuff go _much_ faster, is to add in a voting box (probably labelled "Good As Is", so that if you are presented with a word that looks like it's just fine, you just check the box and hit submit. If you make an actual changes, the box would be ignored. This would allow 2 things: 1. You (lojbab, nora, &c) could skim briefly all of the words with high confidence (more than 3 people think they're good as is, for example). 2. The presentation of entries could be weighted towards showing entries that have never recieved a 'good as is', thus concentrating people's work on the relevant stuff. This would probably require people to sign in to the site, so they wouldn't keep getting entries they'd seen (or edited themselves) before, but I think we all trust each other enough that a password probably won't be necessary. Just for the record, Jay's form is at http://wiw.org/~jkominek/jbovlaste/ User is guest, password is guest. Don't try to actually _use_ it yet, it's still a work in progress. > For the Lojban-English side, we need for each cmavo and compound, English > keywords if appropriate, a short definition if the word is definable, the > selma'o (already there), and a pointer to the reference grammar section(s) > discussing the word or the selma'o. Even a Lojban beginner with a copy of > the book can work on the latter (and you might learn a lot about the > language in the process), since it mostly involves looking stuff up. But > don't volunteer unless you think you can commit enough time to do most or > all of the either the lookup or the definition task within the next few > months on your own - we can't afford a coordinator, and even the CVS option > that Robin is working on seems inappropriate for this because consistency > in style and look-up strategy/coverage is important and editing that sort > of thing done by several people might take as long as doing it ourselves. I _absolutely_ agree with you: CVS is bad for this. But Jay's form forces you to do things in a consistent fashion that can be turned into _any_ form you like. I think it's perfect for this and, IMO, the work is sufficiently boring that if you require a small number of highly-committed volunteers, it'll never get done. Having lots of people do a little work is the way to go here, IMO. > The second task is to go through the rather large accumulation of cmavo > compounds that have actually been used and decide which of them have a > simple English definition, and prepare them as per the above with keywords, > definitions and book references. Assuming that a list of same can be auto-generated, there's no reason why Jay's stuff can't be modified to make that work as well. > The existing set of compounds in the cmavo list was determined > arbitrarily when we set up LogFlash and there are hundreds of other > compounds to be considered. We'll concentrate on the most frequently > used. A compound like "lenu" will either be skipped, or defined > simply as le + nu possibly with a refgrammar reference. I can do a > first cut at weeding these, or someone can volunteer, but there is no > sense in starting this while the current cmavo list remains > unfinished. > > The third task will be to prepare English to Lojban entries for > cmavo. Part of this job will use the results of the above tasks - using > keyword processing as we used for the gismu list, and formatting the > resulting entries to look like the others. Sounds like we save that for later, but again, there's no reason, IMO, not to use a web-based form. > The next area to be worked on are fu'ivla, which have hardly been > tackled. Nick advocates our collecting a fairly large set of easily made > fu'ivla for plants and animals, (and perhaps other common international > science words and foods), using the Linnean genus for each animal in the > Latin ablative case (which gives a consistent vowel ending. > > Nick believes that in most cases this can be made trivially by finding out > Gode's Interlingua word for the plant or animal. So we are seeking people > willing to do some word mining in the Interlingua dictionary(s) (I believe > known as the IED), especially people who are willing to do a little > checking to make sure the words are indeed the genus names. I asked Sunday > night for volunteers from the Lojban community familiar with Interlingua > and its dictionary, and failing that will seek help from the Interlingua > community itself. > > We will also systematically create cultural fu'ivla for all countries in > the world, all languages that are distinct from country names having > greater than N speakers (N ~ 1 million to 10 million, probably); for these > we will have to make the perhaps difficult effort to find out what the name > of these languages and countries are in the native language, which will > take some research. Multiple people can work on this and it can be done > using the CVS approach. > > We will then add in any scientific words from the Interlingua wordmining, > again assuming that the Latinate "prototype" wordform that defines the word > in that language is probably the most international form we can find. > Making the words into valid fu'ivla and coming up with a consistent format > for definitions will be the final step, but these will be relatively easy > steps once the words are assembled because of the limited and regular > semantics. Again, once the word list is compiled, the definitions can be done with Jay's stuff. (BTW, Jay, you da man. 8) > The lujvo will be the hardest project. For the record, it's the lujvo that Jay's site was originally designed for, although I think we were both under the impression that you wanted more of them, which doesn't appear to be the case. > The biggest accomplishment of the weekend is that Nick and Nora and > Shawn Lasseter went through the entire list of lujvo used prior to > January 2000 and either assigned keywords, or marked for research > every one of them. About 30% of the words need to be looked up for > context, which means that we have around 3500 more lujvo keyworded > than we had before with 1500 potentially to be looked up. I also have > around 1000 more lujvo used for the first time during 2000 which have > not been done. These additional words will be keyworded and looked up > in decreasing frequency of usage order "until we are sick of it", > probably cutting off at the 10 or 5 usages level. The new searchable > archive of Lojban List is proving extremely helpful in making the > context searches, and I am hoping that the yahoogroups archives can be > added in to that archive somehow to make things even easier for newly > made words. Shouldn't be hard; wget will grab all of the yahoogroups archives trivially. > The hard task will of course be place structures. And it's for that that Jay's stuff was originally proposed. > We have of course got Nick's prior efforts at place-structure making > from 1994, as well as an automated effort to build place structures > for conversion lujvo using se, te, etc. plus a gismu. Nick has > suggested using a similar automated procedure to generate lujvo and > place structures for the special cases based on nu, ka, ni, mau, tol, > nau, gau, sim, etc. This may take care of a good chunk of the lujvo > already made. > For the remaining place structures, Nick feels that we need to abandon our > attempt at perfection and careful analysis for each word in the > dictionary. Instead, we should have a series of code symbols or font > coding to indicate the level of confidence that we have in the place > structure, and also to include a code for the place structure writer, who > will have more or less credibility based on his perceived knowledge of the > language and amount of lujvo analysis done. We also will not try to have > complete place structures for every word we put in the dictionary, using > the symbol codes to show the level of incompleteness. I think all of this can be done automagically with the checkbox I proposed. > The keywords, of course, typically define the x1 of any lujvo. For those > lujvo that are generally namelike - used only as concrete references in > sumti, we expect to stop at x1, though I myself would like to have an > attempt to determine an x2 for each where it makes sense, since we have few > brivla that are only one-place, and I think that one-place brivla will tend > to damage the predicate nature of the language unless the words really are > one-placers conceptually. But even I agree that we should start with the > single place for these. > > Those lujvo that are verblike, or which can translate as English verbs, > should have complete place structures worked out. These lujvo will be the > workhorses of the bridi structures in the language. We'll start with at > least two places, and add oblique places if they are identifiable. > As with keywording, place structure work will be prioritized to emphasize > words with higher frequency counts. We will use whatever net-based tools > such as CVS as people think are best suited. > BUT, we will put extremely low priority on including new proposed > lujvo. We simply have to draw the line somewhere. Proposals that have > complete place structures may be considered, if they have been compiled in > a single place, such as Arnt's collection in the lojban list file > section. But we aren't going to be looking for these. > Finally, Nick has asked that we again call for a volunteer who can take his > old "Adventure" (colossal cave) text translation and generate a Lojban > version of the game. I have in the past noted that there is a system > called "Inform" which is now used to generate adventure games easily for > all platforms, and it has specific instructions for creating > language-specific versions. There is an entire newsgroup dedicated to > interactive-fiction writing using Inform and other tools. Someone can find > a version of the colossal cave adventure using Inform and use Nick's > translation to complete it. I also have an old hardcoded version of the > program, I believe in some form of Basic, that can be modified by a good > programmer; I started several years ago, and don't remeber how far I got, > but I was hampered by the need to figure out what the code was doing - I > think it was poorly commented. This old version had some portions of the > adventure hard-coded into the program (including the command set, which > should be in Lojban with a "ko " prompt so that the user is forced to enter > imperative commands), and that hardcoded part has not been translated, but > the data files for the Basic program is what Nick completed. The inform > version probably has all the hard coded stuff in data files, but I haven't > studied this. > > The job should not be that difficult for anyone with programming > experience, and it might not take much Lojban expertise. The person who > does this will likely learn how to use the Inform tool and can then > coordinate translations of other adventure games (there are hundreds of > them out there in a single repository file site in Germany and it has > become a significant creative genre with annual competitions - plenty of > good Lojban-learning experiences) or can attempt to write new ones. > > Because this task has languished so long and because the potential is high > for getting lots of good stuff going, I would like at least two people to > volunteer to work on this either together or independently (the inform work > can easily be done independently of any attempt to fix the Basic program). I, myself, have no interest in doing this because adventure games annoy the heck out of me (the puzzle solving just pisses me off). I am, however, seriously considering creating a lojbanic MUD which, IMO, serves all the purposes of making a lojbanic Adventure clone (and can certainly have puzzles incorporated into it) except that it's multi-user, so it provides for interaction as well. How much interest do the Powers That Be have in this? It's going to take a _lot_ of work, and I don't know that I want to get into it if people don't see it as valuable. -Robin -- http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest. le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno je xlali -- RLP http://www.lojban.org/