Message-Id: <199510241641.MAA19330@locke.ccil.org>
Date:         Tue, 24 Oct 1995 12:29:38 EDT
Reply-To: jorge@PHYAST.PITT.EDU
Sender: Lojban list <LOJBAN@CUVMB.BITNET>
From: jorge@PHYAST.PITT.EDU
Subject:      Re: Incredible!
To: John Cowan <cowan@LOCKE.CCIL.ORG>
Status: OR

> >What criterion would this idea have failed to meet?
>
> Well, the obvious one seems to be that the gismu space is so constrained
> that assignment of gismu would have to be nearly random.

Not really, I'm sure you could still get a very high correlation with
the Chinese and English words, which are mostly monosyllables. I doubt
it would be much more random than the current assignment.

You can also add the rafsi KVV, with any CC to form the gismu KVCCV. The
only drawback is that these rafsi need the -r- glue sometimes, but that is
not too bad. "y" is still never needed, and the rafsi are still unique
for a given gismu.

> It is NOT clear that the word-recognition scores algorithm is that effective
> for Lojban gismu making, but I think that there is considerable likelihood
> that the assignments are better than random.

There wouldn't be a significant change in this respect.

> A consequence also is that Lojban words have an uneven phoneme frequency,
> and the frequencies of the phonemes are not unlike the frequencies of
> natural languages the words were built from.

That would still be the case, since for most gismu you would be adding
an arbitrary CC or KV.

> A few people have noticed that
> althought Lojban words look strange, as a text/phoneme string the language
> sounds natural.  It is unclear whether a flat distribution would have this
> trait.

My proposal wouldn't have to have a flat distribution.

> I can't remember how large the current gismu space is, but it is well over 20K
> if my memory is worth anything.  Even including in your less-good rafsi
> forms would lead to only 5K in the gismu space.

Current gismu space: 48*5*17*5 + 17*5*164*5 = 90100
Proposed gismu space: 46*5*17*5 + 13*5*164*5 = 72850

It is the same space, minus the gismu starting with l,m,n,r.

> I also think that design-wise we would not have found only 1400 gismu as
> an upper limit to be too constraining.  When we started designing the
> language we had only 1000 gismu, and this grew to the current 1300.

Ok, with all the additions, there are 5167 possible rafsi, considering all
the forms CCV, CCVN, KVN, KVKN, KVV. Even if you get up to 2000 there is
plenty of redundancy.

> It may be baselined for the foreseeable future, but I don't think that there
> was any evidence back in 1987 that the number of gismu would stop just at this
> particular point.  Indeed some of us figured we would end up close to 2000,
> based on observations that that number seems to commonly occur as a count
> of roots, basic words, etc. in various natural languages.  It may even happen
> eventually that Lojban will get that high, though not for a lot of years.

That's not a problem.

> It wasn't until the first gismu list baselining in 1989 or 1990 (can't remeber
> which year) that the consensus settled towards fewer rather than more gismu,
> and by that time the morphology was pretty much set in concrete since we
> baselined it first (for obvious reasons - you don't want the rules for what
> constitutes a word to change after you have started making wordfs).

I think that this was really the problem all along. Even when the gismu were
originally made, the idea was to reproduce what JCB had done. I can't believe
that a simpler morphology can't be found if working from scratch.

I just thought of another possibility: make all the gismu identical to
its combining form, all of them of form BAL, where:

B: b, c, d, f, g, j, k, p, s, t, v, x, z, bl, br, cf, ck, cl, cm, ..., zv.
   (total: 13+46=59)
A: a, e, i, o, u, ai, au, ei, oi, a'a, a'e, ..., u'u.
   (total: 34)
L: l, m, n, r
   (total: 4)

That gives a space of 8024 from which to select the 1500 or so gismu,
so there is no reason for the distribution of morphemes to be flat. The
most common roots would be monosyllabic and the less common could have
two syllables.

This would even allow us to forget about the stress rule. To separate
lujvo from tanru all that would be needed is a separator like {co}
for tanru: For example, if {xun} was "red" and "zda" was "house",
then {xunzda} would be red-house, and the tanru could be {zda co xun}
or {xunvau zda} (or something else if {vau} can't be used like that).
Even with the additional separator, tanru would not be longer than
they are now.

That is just one possibility, I'm sure there are millions more that
allow to make lujvo in a simple manner (which is the biggest problem
with the current system).

Obviously it can't be changed at this point, but I don't think that
what we have is the best that we could have given a reasonable set of
criteria. It may be the best only if we take the list of gismu as a
given, but that is a historical constraint, so And was right about the
25 years.

Jorge