Return-Path: <@FINHUTC.HUT.FI:LOJBAN@CUVMB.BITNET> Received: from FINHUTC.hut.fi by xiron.pc.helsinki.fi with smtp (Linux Smail3.1.28.1 #1) id m0pvhvC-00006TC; Tue, 26 Apr 94 10:51 EET DST Message-Id: Received: from FINHUTC.HUT.FI by FINHUTC.hut.fi (IBM VM SMTP V2R2) with BSMTP id 7080; Tue, 26 Apr 94 10:51:08 EET Received: from SEARN.SUNET.SE (NJE origin MAILER@SEARN) by FINHUTC.HUT.FI (LMail V1.1d/1.7f) with BSMTP id 7078; Tue, 26 Apr 1994 10:51:08 +0200 Received: from SEARN.SUNET.SE (NJE origin LISTSERV@SEARN) by SEARN.SUNET.SE (LMail V1.2a/1.8a) with BSMTP id 3919; Tue, 26 Apr 1994 09:49:37 +0200 Date: Tue, 26 Apr 1994 03:49:07 -0400 Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group Subject: Re: CONLANG digest 27 X-To: conlang@diku.dk X-cc: lojban@cuvmb.cc.columbia.edu To: Veijo Vilva Content-Length: 8297 Lines: 142 What would I do differently if doing Lojban over? Morphology: Use w and y for the semivowels in diphthoings. This might have made the visible apostrophe rarer or even unnecessary. Use doubled r/m/n/l for the syllabic forms of these consonants. One of the few good ideas from TLI Loglan after the split, though JCB uses them only in borrowings and not throughout the language. Phonology: I would consider using syllabic r/n/l for all "hyphens" in lujvo, thus freeing schwa from the vowel set for use as the main buffer. I'm not certain this is necessary or worthwhile, but would havce considered it. This also would free up 'y' forthe role mentioned above. Word-making: There was a bug in our algorithm for word making, and in any case the number of manual errors in collating the data was higher than it needed to be. Faster computers would make redoing th whole thing take a fraction of the time. (The bug was a rounding error in computing scores - "mamta" which actually should get a 100% score, only gets a 98%). There were also some problems in how we handled affricates in Lojabnizing source words - a LOT of Chinese words Lojbanized with a consonant 'c' leading to dense packing in that letter and other fricatives. Lojban is thus marginally harder to pronounce accurately than it needs to be - I often feel like I am saying the tongue twister "She sells sea shells ..." which I do particularly poorly. There were other problems in Chinese and Russian, and even in English, because we tried to make Lojbanized pronunciations true to an IPA rendering of the component letters. Thus those three languages Lojbanized with a lot of schwas (which we mapped to Lojban 'a' in most cases), when sticking to the visual word would have been more true to the *phonemic* source word, giving greater word contrast and probably recognition. The most obvious symptom of this is the near absence of 'o' in Lojban gismu made by the word-making algorithm, because it usually mapped to some other letter as actually realized in pronunciation. We also misinterpreted one word-making rule JCB originally had - leading to a recognition score in some combinations where there should have been none, by his original intent. This is because we did not actually see the original rules at the time we remade the words, but instead used someone's later interpretation that was actually incorrect (but JCB had never noticed the error and had published it). I would also have experimented with other ways to score Arabic in paritcular, which is poorly represented in Lojban because the algorithm gives equal weight to vowel matching as consonant matching in word recognition, which is not true especially for Arabic (and probably less than true for most languages). This shouyld go under a more general rubric of experimenting more with the algorithm to see what various modifications might have done to the resulting word set, and its recognition. We might have even tried conducting the experiments that JCB has said he conducted in the 50s, but never documented, trading off various algorithms based on actual recognition results. Of course, once the words are done they still have to be memorized, so there are a lot more 'like to do's in this area, most of which would have had little real effect on the language, but mighht have made it a bit more aesthetically pleasing and/or recognizable. More important, I would have been more willing to spread words around for better rafsi assignments. Though redundancy is a factor in too many rafsi, the language SHOULD be such that every rafsi be usable rather than 80% of them. Recognition scores are probably not as important as rafsi assignments in the final language usage. I would have like to have had someone working earlier on lujvo-making. Lojban would be much more used today if we had had Nick Nicholas's catalog of lujvo back right after we had finished the gismu list. Lojban allows words to be made up on the fly, and be understandable, but the adult, educated, audience that we now have, wants and expects to use a stable vocabulary higher than 1200 words. Even if they don' memorize them foroom the start, people want to have the ability to call upon a word, even a sloppy one, to get their concept across quickly on non-trivial subjects. Instead we have the curious situation of Lojban purists who mull endlessly over exactly the right tanru (metaphor) ot use in making a compound (lujvo) that will capture their exact meaning. This is desirable in the long run, but is a handicap in the short term, in that even our best Lojban speakers seem to feel that Lojban is 'hard' to write, giving arguments that essentially ampount to feeling that the language demands a superhuman master of the semantics of the vocabulary. We aren't superhuman. native Lojban speakers may well have such mastery. Lojban needs a lot of words, even inexact ones, so people have some meat to debate (and build their own language usage on). We've delayed the dictionary, in effect, to allow us to incorporate Nick's 3000-odd lujvo, which may give us as many as 10K English entries in the dictionary, instead of the existing 1400. (There are other reasons of course, including my bad work habits, and two lovely but time-consuming kids, but...) Grammar - I would make few if any changes to - we have made them as we have gone along - and I am happy with the direction things took. I wish we had had John Cowan doing the analysis he has done in the last 3 years, during the first year. He might even have soem changes he would have made back then, but I don't. Even if he hadn't, we would probably have gotten to the current situation much more quickly, and the more stable grammar would have increased people's confidence. Incidentally, I don't agree with And: BL> Of the features of Lojban that I dislike, by far the most significant BL> & least nitpicky is the way it is so in-your-face-ly a constituency BL> grammar, what with its terminators & suchlike. Much of my work on BL> natural language is based on the assumption that phrase structure BL> is a fiction of neostructuralist linguistics. I have sometimes wondered BL> whether or how Lojban could be altered to turn it into a dependency-based BL> --More-- BL> (& therefore to my mind naturalistic) language. I think this is one of Lojban's strong points, in that it makes the language easy to teach and learn, at least in the grammatical area. The net result of all the terminators, and especially the "cu" separator, being elidable, is that the language IS more or less 'local' in grammar and hence like his dependency grammar. Most of the formal structure of Lojban is to enable complex sentences to be unambiguous, the type of senetences seldom uttered in spoken use, and not even that frequent in written use. The average Lojban speaker needs to remeber to use about 3 of the terminators/separators (ku, cu, and kei) to cover most of the usages in the spoken language. Since "ku" and "kei" are directly mappable to commas that are found in written English (and we speak these pauses in the spoken English language), spoken Lojban isn't much more nitpicky than English. The difference is that in English, omitting a comma doesn't always lead to misinterprtation except in more complex written text (i.e. legal writing and the like, where commas can be VITAL and English is even more in-your-face than Lojban because the formality is so 'unnatural' to the way of thinking of the fluent English speaker). Misuinterpretation is more likely in Lojban in such circumstances, at least partially because of the lower redundancy resulting from our tight word-packing and avoidance of polysemy. And of course Lojban ADDS to naturalism with some of its more dependency-like structures. I think that our tanru and attitudinal semantics are much more like what you probably mean by depndency-based. But they are much less restricted than the corresponding features in most natural languages. Enough for tonight. I might think of more if I was awake %^) lojbab ---- lojbab Note new address: lojbab@access.digex.net Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273