Message-Id: <m0pvhvC-00006TC@xiron.pc.helsinki.fi>
Date:         Tue, 26 Apr 1994 03:49:07 -0400
Reply-To:     Logical Language Group <lojbab@ACCESS.DIGEX.NET>
Sender:       Lojban list <LOJBAN%CUVMB.bitnet@FINHUTC.hut.fi>
From:         Logical Language Group <lojbab@ACCESS.DIGEX.NET>
Subject:      Re: CONLANG digest 27
To:           Veijo Vilva <veion@XIRON.PC.HELSINKI.FI>
Content-Length: 8297
Lines: 142

What would I do differently if doing Lojban over?

Morphology:
Use w and y for the semivowels in diphthoings.  This might have made the
visible apostrophe rarer or even unnecessary.

Use doubled r/m/n/l for the syllabic forms of these consonants.  One of the
few good ideas from TLI Loglan after the split, though JCB uses them only
in borrowings and not throughout the language.

Phonology:
I would consider using syllabic r/n/l for all "hyphens" in lujvo, thus
freeing schwa from the vowel set for use as the main buffer.  I'm not
certain this is necessary or worthwhile, but would havce considered it.
This also would free up 'y' forthe role mentioned above.

Word-making:
There was a bug in our algorithm for word making, and in any case the
number of manual errors in collating the data was higher than it needed
to be.  Faster computers would make redoing th whole thing take a fraction
of the time.  (The bug was a rounding error in computing scores - "mamta"
which actually should get a 100% score, only gets a 98%).

There were also some problems in how we handled affricates in Lojabnizing
source words - a LOT of Chinese words Lojbanized with a consonant 'c'
leading to dense packing in that letter and other fricatives.  Lojban is thus
marginally harder to pronounce accurately than it needs to be - I often
feel like I am saying the tongue twister "She sells sea shells ..." which I
do particularly poorly.  There were other problems in Chinese and Russian,
and even in English, because we tried to make Lojbanized pronunciations
true to an IPA rendering of the component letters.  Thus those three languages
Lojbanized with a lot of schwas (which we mapped to Lojban 'a' in most cases),
when sticking to the visual word would have been more true to the *phonemic*
source word, giving greater word contrast and probably recognition.  The
most obvious symptom of this is the near absence of 'o' in Lojban gismu made
by the word-making algorithm, because it usually mapped to some other letter
as actually realized in pronunciation.

We also misinterpreted one word-making rule JCB originally had - leading to
a recognition score in some combinations where there should have been none,
by his original intent.  This is because we did not actually see the
original rules at the time we remade the words, but instead used someone's
later interpretation that was actually incorrect (but JCB had never noticed
the error and had published it).

I would also have experimented with other ways to score Arabic in paritcular,
which is poorly represented in Lojban because the algorithm gives
equal weight to vowel matching as consonant matching in word recognition,
which is not true especially for Arabic (and probably less than true for
most languages).  This shouyld go under a more general rubric of experimenting
more with the algorithm to see what various modifications might have done
to the resulting word set, and its recognition.  We might have even tried
conducting the experiments that JCB has said he conducted in the 50s, but
never documented, trading off various algorithms based on actual recognition
results.

Of course, once the words are done they still have to be memorized, so
there are a lot more 'like to do's in this area, most of which would have had
little real effect on the language, but mighht have made it a bit more
aesthetically pleasing and/or recognizable.

More important, I would have been more willing to spread words around for
better rafsi assignments.  Though redundancy is a factor in too many rafsi,
the language SHOULD be such that every rafsi  be usable rather than 80%
of them.  Recognition scores are probably not as important as rafsi assignments
in the final language usage.

I would have like to have had someone working earlier on lujvo-making.
Lojban would be much more used today if we had had Nick Nicholas's catalog
of lujvo back right after we had finished the gismu list.  Lojban allows words
to be made up on the fly, and be understandable, but the adult, educated,
audience that we now have, wants and expects to use a stable vocabulary
higher than 1200 words.  Even if they don' memorize them foroom the
start, people want to have the ability to call upon a word, even a sloppy one,
to get their concept across quickly on non-trivial subjects.  Instead we have
the curious situation of Lojban purists who mull endlessly over exactly the
right tanru (metaphor) ot use in making a compound (lujvo) that will capture
their exact meaning.  This is desirable in the long run, but is a handicap
in the short term, in that even our best Lojban speakers seem to feel that
Lojban is 'hard' to write, giving arguments that essentially ampount to
feeling that the language demands a superhuman master of the semantics of
the vocabulary.

We aren't superhuman.  native Lojban speakers may well have such mastery.
Lojban needs a lot of words, even inexact ones, so people have some meat
to debate (and build their own language usage on).

We've delayed the dictionary, in effect, to allow us to incorporate Nick's
3000-odd lujvo, which may give us as many as 10K English entries in the
 dictionary, instead of the existing 1400.  (There are other reasons of course,
including my bad work habits, and two lovely but time-consuming kids, but...)

Grammar - I would make few if any changes to - we have made them as we have
gone along - and I am happy with the direction things took.  I wish we had
had John Cowan doing the analysis he has done in the last 3 years, during the
first year.  He might even have soem changes he would have made back then,
but I don't.  Even if he hadn't, we would probably have gotten to the current
situation much more quickly, and the more stable grammar would have increased
people's confidence.

Incidentally, I don't agree with And:
BL> Of the features of Lojban that I dislike, by far the most significant
BL> & least nitpicky is the way it is so in-your-face-ly a constituency
BL> grammar, what with its terminators & suchlike. Much of my work on
BL> natural language is based on the assumption that phrase structure
BL> is a fiction of neostructuralist linguistics. I have sometimes wondered
BL> whether or how Lojban could be altered to turn it into a dependency-based
BL> --More--
BL> (& therefore to my mind naturalistic) language.

I think this is one of Lojban's strong points, in that it makes the language
easy to teach and learn, at least in the grammatical area.  The net result
of all the terminators, and especially the "cu" separator, being elidable,
is that the language IS more or less 'local' in grammar and hence
like his dependency grammar.  Most of the formal structure of Lojban is
to enable complex sentences to be unambiguous, the type of senetences seldom
uttered in spoken use, and not even that frequent in written use.  The average
Lojban speaker needs to remeber to use about 3 of the terminators/separators
(ku, cu, and kei) to cover most of the usages in the spoken language.  Since
"ku" and "kei" are directly mappable to commas that are found in written
English (and we speak these pauses in the spoken English language), spoken
Lojban isn't much more nitpicky than English.  The difference is that in
English, omitting a comma doesn't always lead to misinterprtation except
in more complex written text (i.e. legal writing and the like, where commas
can be VITAL and English is even more in-your-face than Lojban because the
formality is so 'unnatural' to the way of thinking of the fluent English
speaker).  Misuinterpretation is more likely in Lojban in such circumstances,
at least partially because of the lower redundancy resulting from our tight
word-packing and avoidance of polysemy.

And of course Lojban ADDS to naturalism with some of its more dependency-like
structures.  I think that our tanru and attitudinal semantics are much more
like what you probably mean by depndency-based.  But they are much less
restricted than the corresponding features in most natural languages.

Enough for tonight.  I might think of more if I was awake %^)

lojbab
----
lojbab                           Note new address:    lojbab@access.digex.net
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                        703-385-0273