Received: from VMS.DC.LSOFT.COM (vms.dc.lsoft.com [205.186.43.2]) by locke.ccil.org (8.6.9/8.6.10) with ESMTP id MAA19330 for ; Tue, 24 Oct 1995 12:41:01 -0400 Message-Id: <199510241641.MAA19330@locke.ccil.org> Received: from PEACH.EASE.LSOFT.COM (205.186.43.4) by VMS.DC.LSOFT.COM (LSMTP for OpenVMS v1.0a) with SMTP id 60CF77B9 ; Tue, 24 Oct 1995 12:33:25 -0400 Date: Tue, 24 Oct 1995 12:29:38 EDT Reply-To: jorge@PHYAST.PITT.EDU Sender: Lojban list From: jorge@PHYAST.PITT.EDU Subject: Re: Incredible! X-To: lojban@cuvmb.cc.columbia.edu To: John Cowan Status: OR X-From-Space-Date: Tue Oct 24 12:41:04 1995 X-From-Space-Address: LOJBAN%CUVMB.BITNET@UBVM.CC.BUFFALO.EDU > >What criterion would this idea have failed to meet? > > Well, the obvious one seems to be that the gismu space is so constrained > that assignment of gismu would have to be nearly random. Not really, I'm sure you could still get a very high correlation with the Chinese and English words, which are mostly monosyllables. I doubt it would be much more random than the current assignment. You can also add the rafsi KVV, with any CC to form the gismu KVCCV. The only drawback is that these rafsi need the -r- glue sometimes, but that is not too bad. "y" is still never needed, and the rafsi are still unique for a given gismu. > It is NOT clear that the word-recognition scores algorithm is that effective > for Lojban gismu making, but I think that there is considerable likelihood > that the assignments are better than random. There wouldn't be a significant change in this respect. > A consequence also is that Lojban words have an uneven phoneme frequency, > and the frequencies of the phonemes are not unlike the frequencies of > natural languages the words were built from. That would still be the case, since for most gismu you would be adding an arbitrary CC or KV. > A few people have noticed that > althought Lojban words look strange, as a text/phoneme string the language > sounds natural. It is unclear whether a flat distribution would have this > trait. My proposal wouldn't have to have a flat distribution. > I can't remember how large the current gismu space is, but it is well over 20K > if my memory is worth anything. Even including in your less-good rafsi > forms would lead to only 5K in the gismu space. Current gismu space: 48*5*17*5 + 17*5*164*5 = 90100 Proposed gismu space: 46*5*17*5 + 13*5*164*5 = 72850 It is the same space, minus the gismu starting with l,m,n,r. > I also think that design-wise we would not have found only 1400 gismu as > an upper limit to be too constraining. When we started designing the > language we had only 1000 gismu, and this grew to the current 1300. Ok, with all the additions, there are 5167 possible rafsi, considering all the forms CCV, CCVN, KVN, KVKN, KVV. Even if you get up to 2000 there is plenty of redundancy. > It may be baselined for the foreseeable future, but I don't think that there > was any evidence back in 1987 that the number of gismu would stop just at this > particular point. Indeed some of us figured we would end up close to 2000, > based on observations that that number seems to commonly occur as a count > of roots, basic words, etc. in various natural languages. It may even happen > eventually that Lojban will get that high, though not for a lot of years. That's not a problem. > It wasn't until the first gismu list baselining in 1989 or 1990 (can't remeber > which year) that the consensus settled towards fewer rather than more gismu, > and by that time the morphology was pretty much set in concrete since we > baselined it first (for obvious reasons - you don't want the rules for what > constitutes a word to change after you have started making wordfs). I think that this was really the problem all along. Even when the gismu were originally made, the idea was to reproduce what JCB had done. I can't believe that a simpler morphology can't be found if working from scratch. I just thought of another possibility: make all the gismu identical to its combining form, all of them of form BAL, where: B: b, c, d, f, g, j, k, p, s, t, v, x, z, bl, br, cf, ck, cl, cm, ..., zv. (total: 13+46=59) A: a, e, i, o, u, ai, au, ei, oi, a'a, a'e, ..., u'u. (total: 34) L: l, m, n, r (total: 4) That gives a space of 8024 from which to select the 1500 or so gismu, so there is no reason for the distribution of morphemes to be flat. The most common roots would be monosyllabic and the less common could have two syllables. This would even allow us to forget about the stress rule. To separate lujvo from tanru all that would be needed is a separator like {co} for tanru: For example, if {xun} was "red" and "zda" was "house", then {xunzda} would be red-house, and the tanru could be {zda co xun} or {xunvau zda} (or something else if {vau} can't be used like that). Even with the additional separator, tanru would not be longer than they are now. That is just one possibility, I'm sure there are millions more that allow to make lujvo in a simple manner (which is the biggest problem with the current system). Obviously it can't be changed at this point, but I don't think that what we have is the best that we could have given a reasonable set of criteria. It may be the best only if we take the list of gismu as a given, but that is a historical constraint, so And was right about the 25 years. Jorge