Message-Id: <m0qqWnA-00005XC@xiron.pc.helsinki.fi>
Date:         Thu, 29 Sep 1994 21:25:17 -0400
Reply-To:     Erik Rauch <rauch@CS.YALE.EDU>
Sender:       Lojban list <LOJBAN%CUVMB.bitnet@FINHUTC.hut.fi>
From:         Erik Rauch <rauch@CS.YALE.EDU>
Subject:      The lujvo-making algorithm
To:           Veijo Vilva <veion@XIRON.PC.HELSINKI.FI>
Content-Length: 2284
Lines: 41

The release of the draft dictionary rather urgently brings up an issue
which I don't remember being discussed in my two years on the list: the
unity, if any, of the body of lujvo. I am assuming we are aiming for a
single body, in that future day when lojban is a language with a large
corpus and many speakers. How are we to deal with different lojbanists
having different preferences for generating them, and is the current one
the best?

Is a lujvo that uses the three-letter rafsi whenever possible always the
best lujvo? I'm thinking specifically of cases where the only three-letter
rafsi has as many syllables (two) as the four- or five-letter one. The two
qualities I think you want to maximize in a lujvo are (1) shortness,
measured in number of syllables, and (2) recognizability, that is,
similarity to the corresponding tanru - in that order. (This is when you're
not writing poetry or otherwise paying much attention to the sound.)

As for #1, there's been some debate on the conlang list on how good this is
as a measure of the length of time needed to say something, but in lojban,
with its similar-length vowels and lack of large consonant clusters, it is
a pretty good measure. My concern here is that we are instead minimizing
the number of _letters_ - not counting '.

Should sa'urmi'e (sarcu minde) not be sarcyminde, for example? vi'ecpe
(vitke cpedu) vitkycpe? You get more similarity without increasing the time
it takes to say them (or by increasing it a tiny amount).

Then there's the issue of simple abstraction. nunklama seems to be
preferred over nunkla when not compounded with something else, why? I think
there's a good reason. kamblanu is better than kambla, even though you'd
save two phonemes and a syllable to boot.

Is it already too late to worry about this?
(Side note: Although a considerable number of man-hours have gone into the
lujvo list and the dictionary, which already contain many lujvo made with
the current algorithm, it would not be hard to come up with a program to
convert lujvo automatically; you'd just have to encode the above into an
alternate lujvo algorithm. You could even have one that cranks through
mixed text, replacing all parsable lujvo).


| Erik Rauch                                             rauch@cs.yale.edu |