Date: Sat, 11 Jul 92 04:28:17 -0400 From: lojbab@grebyn.com (Logical Language Group) Message-Id: <9207110828.AA17425@daily.grebyn.com> To: cowan@snark.thyrsus.com, nsn@mullian.ee.mu.oz.au Subject: rafis tuning; e'o ko jinvi cusku Content-Length: 11654 Lines: 197 Time for me to ask for some input on rafsi tuning. Things are getting to be a sticky wicket. I'm asking you, Nick, because you've asked for some specific rafsi changes, hence must feel some amount of tuning is appropriate. I think so too, and Cowan does, but possibly in a more limited sense. Most of the other Jimbobs have said little or nothing (indeed, Ivan, Mark, and Colin never did do anything more with that lujvo file after you did the first cut on it, and all three are out-of-town at the critical time anyway). We have some new people like Iain worthy of joining the Jimbobs, but at this point time is too pressing to bring them up to speed like I did with you last year - I think I can explain things to you more briefly than I can to new people, and get a good feel for what is right based on your response. I'm also ccing this to Cowan, but he and have and will be able to talk further. Having said this, I will now note that Nora seems to be almost completely opposed to rafsi tuning. The fact that we have made it a point to NOT baseline this list because we planned long ago for an expected necessary tuning, doesn't mean anything to her. The purpose for the delay, of course, was to get actual usage data to buttress the theoretical statistics largely based on JCB's group's truly appalling metaphor-making practices, and almost no real usage of these enormous lists of proposed words. But Nora feels that the mere soliciting of people to use the language cements the design in place. She is at the extreme on the question of relearning of ANYTHING in the language; it must truly be broken for her to believe it needs fixing if ANYONE has used the feature. Her argument is that JCB drove many people off by endless fiddling with apparently solid parts of the design, and part of our charter was to stop the fiddling, and let the people have and use the language. She might be willing to see some minor changes based on trying to squeeze in this word with a rafsi here and that word with a rafsi there, if consensus demands that changes be made. I will note that she feels the same way about the place structures. She would rather have a cast-in-concrete bad or clumsy design feature than one that changes after someone - ANYONE - has made an effort to learn it. And she doesn't want old Lojban text to be rendered invalid due to rafsi or place structure changes. I think the latter is inevitable, if not now, then when we give up control, so I have much less resistance than she does. On the other hand, I have the most relearning at stake when things change (a fact I use to temper my own judgement whenever I consider a possible change). The problem is that this isn't generally possible. Unlike the cmavo, we have rafsi possibilities for each word that are very limitedly derived from the base gismu wordform. With the cmavo, we knew this wasn't possible, and didn't try. But the lujvo space is nearly as packed as the cmavo space, and is especially dense in certain parts of the alphabet, and it is those parts, of course, that are under stress for change from new words. Thus, changes in one word cause cascading changes to a whole series of other words, all forced. Sooner or later the chain fizzles out because you run into a hole in the rafsi list, or more frequently, into a word wherein it matters little if it loses its rafsi because it may only be used in one or two lujvo that no one remembers. (My older data is totally statistical, and I cannot trace back to the words that caused the scoring - I have JCB's 200 page dictionary, 200 pages of lujvo proposals, 3000 words of Eaton data, and a bunch of other margin notes that are in some cases no longer interpretable.) Explaining rafsi tuning to the masses would be very difficult. The actual usage data of the last few years actually does outweigh the earlier data in my statistics, because the old data had some bad skewings. However, the differences between usage and theory are not enormous. The old data with the current rafsi assignments gives an efficiency score of 94.5%; i.e. In 94.5% of all proposed lujvo rafsi positions, a short rafsi exists for the word in question that can be used in that position. With the new data, this score drops to 92%, which is not that much of a change. But we're talking some 10000-20000 words, so 2.5% is some 250-500 lujvo that use long-form rafsi, and these are presumably all from the new words. Since the new data was only around 2500 lujvo, even weighted at about double weight, you can see that the efficiency of the current rafsi assignments is nowhere near 94.5% or 92%. The data I have shows several gismu with a half dozen lujvo and no rafsi assigned. Twery's fetish has added 30 words that use cinse (luckily weighted low, because he seldom uses them), but I think skicu, for example is a bigger problem, with velskicu actually much more common than the bare gismu itself (change the place structure?). The worst off 25 gismu in temrs of rafsi needs have an average of a half-dozen lujvo each where they need rafsi. Note that the statistics do not even allow for hyphenation - kaz and niz are perfectly good rafsi for ka and ni in my statistics, regardless of a significant consensus that they need better ones. Doing a full tuning of the rafsi list, as I've been slogging at for the last week, stands to raise that efficiency score only a couple of percent, probably not even as high as the 94.5% it was at, because the diverse usages that you and others have applied the language to have meant that ever more gismu need some rafsi, and they aren't available. CVV rafsi are especially dear, if it is desired to shorten much-used words like cinse and skicu in final position. But nanba and nanca both compete for na'a (with equal weights, but only one can win), and ritli, rirni, and friti are currently in a dead heat for "ri'i" (finti displaced friti for "fi'i" early in the game since people realized that finti rather than ciska was the basis for author and artistic creation). The change rate is running at 12-15% of the rafsi so far, and I'm 1/2 done and accelerating - but I've put off the tough decisions till the trade-offs were clearer, things are getting tough. So I get to ask for advice exactly once, because if we spend time exchanging ideas at a couple of days per exchange, the damned job will never get done. 1. As the premier maker of lujvo in actual Lojban text, how bothered are you by the prospect of 15% of the rafsi changing (I expect that we will be able to wirte software that can build new words for each existing lujvo, allowing for automatic update of the texts the the changed language. But this makes no allowance for: a) the extent to which you've learned rafsi and hence don't look them up (which as you know has been a problem with your texts anyway - you guess wrong on some rafsi) b) sound and rhyme patterns in poetry (Helsem's work, primarily not on computer, isn't even in the statistics) c) intentional choice of long word forms where short ones might exist. d) related to c) - the scoring algorithm of the lujvo maker may differ from what human beings might choose as the most preferable lujvo form. 2. Typical problems: a) Actually this is the tought one hanging now: zmadu has both mau and zma - there is no competition for zma, and a bit for mau, but mau is the rafsi that most people use, and of course matches the related cmavo. By freeing mau, it can be used for other words, forming a loosening chain that eventually allows a CVV rafsi to be assigned to a word that doesn't have one. In this case, the best chain is especially ugly: it gives mau to cmalu (talk about a rafsi change causing a meaning change), then cma to cmana, freeing ca'a for cabra, which then gives bra to barda (which no longer needs 'bad' - and I can't remember what gets it). So your translation would become brabra kevna. I have some half dozen usages of barda in final position, all with significant frequency: superlative-big mildly-big ni-big ka-big, etc., besides badbarda. It thus contributes significantly to the efficiency score loss. The other final position rafsi it can use is ba'a, which barna uses. Based on pure statistics alone, if zmadu and cmalu don't change, barda gets bra, cabra takes ca'a, and either cmana goes rafsi-less, gets a CVC that doesn't help with hill (cmaca'a) and volcano (pojyca'a), or takes ma'a from matma, which has about equal need for it, and some historical claim (JCB had ma'a for his word for mother). These kinds of choices Nora finds distressing, and has little advice and she votes no on >every< change, unless there is consensus to go ahead - i.e. she wants to hold to baseline rules even though the list is not baselined. I'm inclined, with pain, to drop the CVV for mamta, or possibly cmana. There is symmetry in a CCV for both cmalu and barda, and even I don't like the radical change in meaning for "mau". But I have to admit that in general I prefer whereever practical to let statistics decide when to make a change because of the fear of cultural bias creeping in. b) a lighter change - but showing the depth of difficulties. lojbab has traditionally been humorously translated as "logical-soap", and Nora based one comic strip on that interpretation. But it now turns out that zbabu could also use "zab" which is currently not in use, letting blabi have "bab" and either lanbi (protein - probably the one we'd choose) or labno (wolf) get "lab". Neither of the two have been used in any lujvo thus far, but protein seems likely to eventually be used in some, and to also have use in making le'avla. So here we have a tuning decision based on a hypothetical rather than actual use of a gismu in lujvo, as a tradeoff against a single humorous use of the existing assignment (logical white just doesn't fit the comic strip, and it isn't merely a wording change, either). c. Some other rafsi assignments I am considering cast in concrete. The words brivla, selma'o, and le'avla will not change again - too much of our materials has these embedded to consider a change. I'm actually putting the cut-off of importance much lower than these words - "badbarda" is almost memorable enough for me to lock it in, but then I stop to realize that there is no evidence that anyone but jimc and myself have ever read more than a couple of sentences. But bavlamdei is on the line - I gave bav to balvi even though it has also gained bla from blanu and no longer needs a CVC - the only word that could use bav is bavmi and barley doesn't strike me as being a high-usage lujvo source. But I expect that bav will fade away, and bavlamdei change to blalamdei (or whatever, if the other rafsi change) - the word is much used around here, but not much elsewhere to my knowledge. 3. What I expect to do when done is to present the new list as a whole but also to try to pick out the multiple chains of changes from within so I can attempt to pragmatically explain what caused each if it is identifiable, and allow each chain, to the extent it is independent of others, to be decided individually, rather than to approve the entire change package as a single lump. But this is a lot of work if most people don't really care. Is it worth the effort to you? 4. What does seem clear from Nora's resistance, is that the rafsi list MUST be baselined when any changes from this exercise get decided. So I intend a vote for baseline at LogFest, with additional voting from yourself and other significants who cannot be present. Opinions needed quickly on whatever this essay prompts in your mind.