[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
rafis tuning; e'o ko jinvi cusku
Time for me to ask for some input on rafsi tuning. Things are getting
to be a sticky wicket. I'm asking you, Nick, because you've asked for
some specific rafsi changes, hence must feel some amount of tuning is
appropriate. I think so too, and Cowan does, but possibly in a more
limited sense. Most of the other Jimbobs have said little or nothing
(indeed, Ivan, Mark, and Colin never did do anything more with that
lujvo file after you did the first cut on it, and all three are
out-of-town at the critical time anyway). We have some new people like
Iain worthy of joining the Jimbobs, but at this point time is too
pressing to bring them up to speed like I did with you last year - I
think I can explain things to you more briefly than I can to new people,
and get a good feel for what is right based on your response. I'm also
ccing this to Cowan, but he and have and will be able to talk further.
Having said this, I will now note that Nora seems to be almost
completely opposed to rafsi tuning. The fact that we have made it a
point to NOT baseline this list because we planned long ago for an
expected necessary tuning, doesn't mean anything to her. The purpose
for the delay, of course, was to get actual usage data to buttress the
theoretical statistics largely based on JCB's group's truly appalling
metaphor-making practices, and almost no real usage of these enormous
lists of proposed words. But Nora feels that the mere soliciting of
people to use the language cements the design in place. She is at the
extreme on the question of relearning of ANYTHING in the language; it
must truly be broken for her to believe it needs fixing if ANYONE has
used the feature. Her argument is that JCB drove many people off by
endless fiddling with apparently solid parts of the design, and part of
our charter was to stop the fiddling, and let the people have and use
the language. She might be willing to see some minor changes based on
trying to squeeze in this word with a rafsi here and that word with a
rafsi there, if consensus demands that changes be made.
I will note that she feels the same way about the place structures. She
would rather have a cast-in-concrete bad or clumsy design feature than
one that changes after someone - ANYONE - has made an effort to learn
it. And she doesn't want old Lojban text to be rendered invalid due to
rafsi or place structure changes. I think the latter is inevitable, if
not now, then when we give up control, so I have much less resistance
than she does. On the other hand, I have the most relearning at stake
when things change (a fact I use to temper my own judgement whenever I
consider a possible change).
The problem is that this isn't generally possible. Unlike the cmavo, we
have rafsi possibilities for each word that are very limitedly derived
from the base gismu wordform. With the cmavo, we knew this wasn't
possible, and didn't try. But the lujvo space is nearly as packed as
the cmavo space, and is especially dense in certain parts of the
alphabet, and it is those parts, of course, that are under stress for
change from new words. Thus, changes in one word cause cascading
changes to a whole series of other words, all forced. Sooner or later
the chain fizzles out because you run into a hole in the rafsi list, or
more frequently, into a word wherein it matters little if it loses its
rafsi because it may only be used in one or two lujvo that no one
remembers. (My older data is totally statistical, and I cannot trace
back to the words that caused the scoring - I have JCB's 200 page
dictionary, 200 pages of lujvo proposals, 3000 words of Eaton data, and
a bunch of other margin notes that are in some cases no longer
interpretable.)
Explaining rafsi tuning to the masses would be very difficult. The
actual usage data of the last few years actually does outweigh the
earlier data in my statistics, because the old data had some bad
skewings. However, the differences between usage and theory are not
enormous. The old data with the current rafsi assignments gives an
efficiency score of 94.5%; i.e. In 94.5% of all proposed lujvo rafsi
positions, a short rafsi exists for the word in question that can be
used in that position.
With the new data, this score drops to 92%, which is not that much of a
change. But we're talking some 10000-20000 words, so 2.5% is some
250-500 lujvo that use long-form rafsi, and these are presumably all
from the new words. Since the new data was only around 2500 lujvo, even
weighted at about double weight, you can see that the efficiency of the
current rafsi assignments is nowhere near 94.5% or 92%. The data I have
shows several gismu with a half dozen lujvo and no rafsi assigned.
Twery's fetish has added 30 words that use cinse (luckily weighted low,
because he seldom uses them), but I think skicu, for example is a bigger
problem, with velskicu actually much more common than the bare gismu
itself (change the place structure?). The worst off 25 gismu in temrs
of rafsi needs have an average of a half-dozen lujvo each where they
need rafsi.
Note that the statistics do not even allow for hyphenation - kaz and niz
are perfectly good rafsi for ka and ni in my statistics, regardless of a
significant consensus that they need better ones.
Doing a full tuning of the rafsi list, as I've been slogging at for the
last week, stands to raise that efficiency score only a couple of
percent, probably not even as high as the 94.5% it was at, because the
diverse usages that you and others have applied the language to have
meant that ever more gismu need some rafsi, and they aren't available.
CVV rafsi are especially dear, if it is desired to shorten much-used
words like cinse and skicu in final position. But nanba and nanca both
compete for na'a (with equal weights, but only one can win), and ritli,
rirni, and friti are currently in a dead heat for "ri'i" (finti
displaced friti for "fi'i" early in the game since people realized that
finti rather than ciska was the basis for author and artistic creation).
The change rate is running at 12-15% of the rafsi so far, and I'm 1/2
done and accelerating - but I've put off the tough decisions till the
trade-offs were clearer, things are getting tough. So I get to ask for
advice exactly once, because if we spend time exchanging ideas at a
couple of days per exchange, the damned job will never get done.
1. As the premier maker of lujvo in actual Lojban text, how bothered are
you by the prospect of 15% of the rafsi changing (I expect that we will
be able to wirte software that can build new words for each existing
lujvo, allowing for automatic update of the texts the the changed
language. But this makes no allowance for:
a) the extent to which you've learned rafsi and hence don't look them up
(which as you know has been a problem with your texts anyway - you guess
wrong on some rafsi)
b) sound and rhyme patterns in poetry (Helsem's work, primarily not on
computer, isn't even in the statistics)
c) intentional choice of long word forms where short ones might exist.
d) related to c) - the scoring algorithm of the lujvo maker may differ
from what human beings might choose as the most preferable lujvo form.
2. Typical problems:
a) Actually this is the tought one hanging now: zmadu has both mau and
zma - there is no competition for zma, and a bit for mau, but mau is the
rafsi that most people use, and of course matches the related cmavo. By
freeing mau, it can be used for other words, forming a loosening chain
that eventually allows a CVV rafsi to be assigned to a word that doesn't
have one. In this case, the best chain is especially ugly: it gives
mau to cmalu (talk about a rafsi change causing a meaning change), then
cma to cmana, freeing ca'a for cabra, which then gives bra to barda
(which no longer needs 'bad' - and I can't remember what gets it). So
your translation would become brabra kevna. I have some half dozen usages of
barda in final position, all with significant frequency: superlative-big
mildly-big ni-big ka-big, etc., besides badbarda. It thus contributes
significantly to the efficiency score loss. The other final position
rafsi it can use is ba'a, which barna uses.
Based on pure statistics alone, if zmadu and cmalu don't change, barda
gets bra, cabra takes ca'a, and either cmana goes rafsi-less, gets a CVC
that doesn't help with hill (cmaca'a) and volcano (pojyca'a), or takes
ma'a from matma, which has about equal need for it, and some historical
claim (JCB had ma'a for his word for mother). These kinds of choices
Nora finds distressing, and has little advice and she votes no on
>every< change, unless there is consensus to go ahead - i.e. she wants
to hold to baseline rules even though the list is not baselined.
I'm inclined, with pain, to drop the CVV for mamta, or possibly cmana.
There is symmetry in a CCV for both cmalu and barda, and even I don't
like the radical change in meaning for "mau". But I have to admit that
in general I prefer whereever practical to let statistics decide when to
make a change because of the fear of cultural bias creeping in.
b) a lighter change - but showing the depth of difficulties. lojbab has
traditionally been humorously translated as "logical-soap", and Nora
based one comic strip on that interpretation. But it now turns out that
zbabu could also use "zab" which is currently not in use, letting blabi
have "bab" and either lanbi (protein - probably the one we'd choose) or
labno (wolf) get "lab". Neither of the two have been used in any lujvo
thus far, but protein seems likely to eventually be used in some, and to
also have use in making le'avla. So here we have a tuning decision
based on a hypothetical rather than actual use of a gismu in lujvo, as a
tradeoff against a single humorous use of the existing assignment
(logical white just doesn't fit the comic strip, and it isn't merely a
wording change, either).
c. Some other rafsi assignments I am considering cast in concrete. The
words brivla, selma'o, and le'avla will not change again - too much of
our materials has these embedded to consider a change. I'm actually
putting the cut-off of importance much lower than these words -
"badbarda" is almost memorable enough for me to lock it in, but then I
stop to realize that there is no evidence that anyone but jimc and
myself have ever read more than a couple of sentences. But bavlamdei is
on the line - I gave bav to balvi even though it has also gained bla
from blanu and no longer needs a CVC - the only word that could use bav
is bavmi and barley doesn't strike me as being a high-usage lujvo
source. But I expect that bav will fade away, and bavlamdei change to
blalamdei (or whatever, if the other rafsi change) - the word is much
used around here, but not much elsewhere to my knowledge.
3. What I expect to do when done is to present the new list as a whole
but also to try to pick out the multiple chains of changes from within
so I can attempt to pragmatically explain what caused each if it is
identifiable, and allow each chain, to the extent it is independent of
others, to be decided individually, rather than to approve the entire
change package as a single lump. But this is a lot of work if most
people don't really care. Is it worth the effort to you?
4. What does seem clear from Nora's resistance, is that the rafsi list
MUST be baselined when any changes from this exercise get decided. So I
intend a vote for baseline at LogFest, with additional voting from
yourself and other significants who cannot be present.
Opinions needed quickly on whatever this essay prompts in your mind.