From lojbab@lojban.org Tue Apr 24 14:43:48 2001 Return-Path: X-Sender: lojbab@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_1_2); 24 Apr 2001 21:43:48 -0000 Received: (qmail 18887 invoked from network); 24 Apr 2001 21:43:47 -0000 Received: from unknown (10.1.10.27) by m8.onelist.org with QMQP; 24 Apr 2001 21:43:47 -0000 Received: from unknown (HELO stmpy-2.cais.net) (205.252.14.72) by mta2 with SMTP; 24 Apr 2001 21:43:47 -0000 Received: from bob.lojban.org (72.dynamic.cais.com [207.226.56.72]) by stmpy-2.cais.net (8.11.1/8.11.1) with ESMTP id f3OLhiT44393 for ; Tue, 24 Apr 2001 17:43:44 -0400 (EDT) Message-Id: <4.3.2.7.2.20010424161239.00ad1100@127.0.0.1> X-Sender: vir1036/pop.cais.com@127.0.0.1 X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Tue, 24 Apr 2001 17:46:54 -0400 To: lojban@yahoogroups.com Subject: Re: [lojban] NickFest 2 In-Reply-To: <20010424115814.M28300@digitalkingdom.org> References: <4.3.2.7.2.20010424020703.00bba800@127.0.0.1> <4.3.2.7.2.20010424020703.00bba800@127.0.0.1> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed From: "Bob LeChevalier (lojbab)" At 11:58 AM 04/24/2001 -0700, Robin Lee Powell wrote: >Holy sweet potato, was that a long post. > >lojbab, it looks like you missed the discussion with Jay about building >a dictionary entry site, so I'm going to mostly be addressing that in my >respones. I didn't miss it, but was not paying close attention to the details. >On Tue, Apr 24, 2001 at 04:52:26AM -0400, Bob LeChevalier-Logical Language >Group wrote: > > The level 1 package has traditionally been a set of wordlists and the > > E-BNF. We pretty much have all the pieces needed to edit these lists into > > another book, which will be a "pocket dictionary" of Lojban. We aren't > > seeking a lot of new material (no more lujvo), > >Aaaaawwwww.... > >_Now_ he tells us. 8) That's for the pocket dictionary, which has to have some limits or it won't fit in a pocket. The unabridged dictionary, not to mention any on-line version, can still be added to, as I think I noted. > > though I want to do something to improve the cmavo list. Depending on > > time and money, this could come out by the end of this year. > > But this really requires that we minimize new stuff for the pocket dictionary. > > The LogFest members meeting will decide whether the pocket dictionary and > > the intro lessons will constitute the baseline dictionary and textbook, > > starting the infamous "5 year freeze period", or whether we should wait > > until we publish a full dictionary and more thorough textbook, which > > projects may start moving along once these other books are done. I will > > admit to wanting to have the full package for the 5 year period myself, > but > > it is not solely my decision. You should speak up before LogFest. > >I want to wait, not least because I'd rather not start the freeze until >we have at least one substantial (read: book-length) translation >finished, and more fluent people. > >How many known fluents do we have, anyways? The only known ones are Nick and Goran. Several others can carry on a halting conversation including my wife and me (and I don't use a wordlist when I do so, so I am just one step short of fluency, but have never made that last step because I cannot stop myself from translating rather than trying to think in Lojban. xod has promised to be skilled enough to speak Lojban only during the entirety of the next LogFest, and Mark Shoulson has taken that as a challenge to him to do the same. Both have enough skill to be fluent given a solid weekend of live practice. > > A major hold up on the dictionary has been deciding what to do about > cmavo, > > and this also affects the pocket dictionary. The existing cmavo list does > > not actually define most of the words, and the keywords were designed for > > LogFlash, not as proper definitions. We thus have three tasks, and I am > > willing to farm these out to volunteers in whole or in part. > > > >All of these tasks can be done by many people using the stuff Jay is >working on. Possibly, but consistency is going to be a problem. If the work is not done consistently on the cmavo list, it will probably have to be redone (as you acknowledge below). What we described for lujvo allows for differing levels of quality in the definitions, but I don't want to have to deal with that with the other word classes. >Something Jay and I have not discussed, but that I think would make this >stuff go _much_ faster, is to add in a voting box (probably labelled >"Good As Is", so that if you are presented with a word that looks like >it's just fine, you just check the box and hit submit. If you make an >actual changes, the box would be ignored. > >This would allow 2 things: > >1. You (lojbab, nora, &c) could skim briefly all of the words with high >confidence (more than 3 people think they're good as is, for example). > >2. The presentation of entries could be weighted towards showing >entries that have never recieved a 'good as is', thus concentrating >people's work on the relevant stuff. This all sounds very good for the lujvo effort. I'm less sure it would help for the cmavo effort, in part because I'm skeptical that the volunteers will show up in force. > > For the Lojban-English side, we need for each cmavo and compound, English > > keywords if appropriate, a short definition if the word is definable, the > > selma'o (already there), and a pointer to the reference grammar section(s) > > discussing the word or the selma'o. Even a Lojban beginner with a copy of > > the book can work on the latter (and you might learn a lot about the > > language in the process), since it mostly involves looking stuff up. But > > don't volunteer unless you think you can commit enough time to do most or > > all of the either the lookup or the definition task within the next few > > months on your own - we can't afford a coordinator, and even the CVS > option > > that Robin is working on seems inappropriate for this because consistency > > in style and look-up strategy/coverage is important and editing that sort > > of thing done by several people might take as long as doing it ourselves. > >I _absolutely_ agree with you: CVS is bad for this. But Jay's form >forces you to do things in a consistent fashion that can be turned into >_any_ form you like. I think it's perfect for this and, IMO, the work >is sufficiently boring that if you require a small number of >highly-committed volunteers, it'll never get done. Having lots of >people do a little work is the way to go here, IMO. I'm willing to try anything at least for a little, but I'll remain skeptical till I see some response. The Lojban community has certainly grown more active of late, but we have a history of small scale volunteer efforts never accomplishing anything. > > The second task is to go through the rather large accumulation of cmavo > > compounds that have actually been used and decide which of them have a > > simple English definition, and prepare them as per the above with > keywords, > > definitions and book references. > >Assuming that a list of same can be auto-generated, there's no reason >why Jay's stuff can't be modified to make that work as well. Fine. But I have to admit that when I do this work, I do it several hundred words at a time, and I myself would hate to fill out forms on a word by word basis. Maybe that is why I am skeptical - the only way these projects have gotten done in the past has been for people to put in hours doing chunks of a couple hundred words at a shot. > > The third task will be to prepare English to Lojban entries for > > cmavo. Part of this job will use the results of the above tasks - using > > keyword processing as we used for the gismu list, and formatting the > > resulting entries to look like the others. > > > >Sounds like we save that for later, but again, there's no reason, IMO, >not to use a web-based form. We'll probably use automated techniques for this. Again, too many words to do one at a time. >Again, once the word list is compiled, the definitions can be done with >Jay's stuff. > >(BTW, Jay, you da man. 8) > > > The lujvo will be the hardest project. > >For the record, it's the lujvo that Jay's site was originally designed >for, although I think we were both under the impression that you wanted >more of them, which doesn't appear to be the case. I have always been on record that what we want is to see the words that actually get used in the language defined. We do need some more semantic coverage in some areas, but it is premature to figure what areas these are when we can't determine what words we already have. >The new searchable > > archive of Lojban List is proving extremely helpful in making the > > context searches, and I am hoping that the yahoogroups archives can be > > added in to that archive somehow to make things even easier for newly > > made words. > >Shouldn't be hard; wget will grab all of the yahoogroups archives >trivially. If you can grab and expand the text archives on lojban.org into the mix, we'll have 95%+ of the stuff that the words were extracted from. > > The hard task will of course be place structures. > >And it's for that that Jay's stuff was originally proposed. I'll look forward to seeing results. > For the remaining place structures, Nick feels that we need to abandon our > > attempt at perfection and careful analysis for each word in the > > dictionary. Instead, we should have a series of code symbols or font > > coding to indicate the level of confidence that we have in the place > > structure, and also to include a code for the place structure writer, who > > will have more or less credibility based on his perceived knowledge of the > > language and amount of lujvo analysis done. We also will not try to have > > complete place structures for every word we put in the dictionary, using > > the symbol codes to show the level of incompleteness. > >I think all of this can be done automagically with the checkbox I >proposed. OK. > Because this task has languished so long and because the potential is high > > for getting lots of good stuff going, I would like at least two people to > > volunteer to work on this either together or independently (the inform > work > > can easily be done independently of any attempt to fix the Basic program). > >I, myself, have no interest in doing this because adventure games annoy >the heck out of me (the puzzle solving just pisses me off). > >I am, however, seriously considering creating a lojbanic MUD which, IMO, >serves all the purposes of making a lojbanic Adventure clone (and can >certainly have puzzles incorporated into it) except that it's >multi-user, so it provides for interaction as well. > >How much interest do the Powers That Be have in this? It's going to >take a _lot_ of work, and I don't know that I want to get into it if >people don't see it as valuable. I have one volunteer already. The advantage of Colossal Cave is that most people know how to solve the puzzles if they have ever played an adventure game, it being the original model. Which makes it an exercise in reading the Lojban so you know the (lojban) command appropriate to solving the puzzle that you already know how to solve. The main reasons it is "important" is that we don't have this sort of thing (your MUD effort will be welcome, in other words) coupled with the fact that Nick translated the text files 8 years ago, an enormous 107K of Lojban text from the time when people were still struggling to write sentences, and they have yet to see the light of day (the text is hidden on the text archive, but I doubt that anyone has seen it there amidst the megabytes of other stuff). Some of Nick's best Lojban work and the largest text ever translated has thus never been read. lojbab -- lojbab lojbab@lojban.org Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 Artificial language Loglan/Lojban: http://www.lojban.org