Return-Path: <@FINHUTC.HUT.FI:LOJBAN@CUVMB.BITNET> Received: from FINHUTC.hut.fi by xiron.pc.helsinki.fi with smtp (Linux Smail3.1.28.1 #1) id m0qqWnA-00005XC; Fri, 30 Sep 94 03:29 EET Message-Id: Received: from FINHUTC.HUT.FI by FINHUTC.hut.fi (IBM VM SMTP V2R2) with BSMTP id 1508; Fri, 30 Sep 94 03:29:56 EET Received: from SEARN.SUNET.SE (NJE origin MAILER@SEARN) by FINHUTC.HUT.FI (LMail V1.1d/1.7f) with BSMTP id 1505; Fri, 30 Sep 1994 03:29:55 +0200 Received: from SEARN.SUNET.SE (NJE origin LISTSERV@SEARN) by SEARN.SUNET.SE (LMail V1.2a/1.8a) with BSMTP id 1548; Fri, 30 Sep 1994 02:26:58 +0100 Date: Thu, 29 Sep 1994 21:25:17 -0400 Reply-To: Erik Rauch Sender: Lojban list From: Erik Rauch Subject: The lujvo-making algorithm X-To: Lojban List To: Veijo Vilva Content-Length: 2284 Lines: 41 The release of the draft dictionary rather urgently brings up an issue which I don't remember being discussed in my two years on the list: the unity, if any, of the body of lujvo. I am assuming we are aiming for a single body, in that future day when lojban is a language with a large corpus and many speakers. How are we to deal with different lojbanists having different preferences for generating them, and is the current one the best? Is a lujvo that uses the three-letter rafsi whenever possible always the best lujvo? I'm thinking specifically of cases where the only three-letter rafsi has as many syllables (two) as the four- or five-letter one. The two qualities I think you want to maximize in a lujvo are (1) shortness, measured in number of syllables, and (2) recognizability, that is, similarity to the corresponding tanru - in that order. (This is when you're not writing poetry or otherwise paying much attention to the sound.) As for #1, there's been some debate on the conlang list on how good this is as a measure of the length of time needed to say something, but in lojban, with its similar-length vowels and lack of large consonant clusters, it is a pretty good measure. My concern here is that we are instead minimizing the number of _letters_ - not counting '. Should sa'urmi'e (sarcu minde) not be sarcyminde, for example? vi'ecpe (vitke cpedu) vitkycpe? You get more similarity without increasing the time it takes to say them (or by increasing it a tiny amount). Then there's the issue of simple abstraction. nunklama seems to be preferred over nunkla when not compounded with something else, why? I think there's a good reason. kamblanu is better than kambla, even though you'd save two phonemes and a syllable to boot. Is it already too late to worry about this? (Side note: Although a considerable number of man-hours have gone into the lujvo list and the dictionary, which already contain many lujvo made with the current algorithm, it would not be hard to come up with a program to convert lujvo automatically; you'd just have to encode the above into an alternate lujvo algorithm. You could even have one that cranks through mixed text, replacing all parsable lujvo). | Erik Rauch rauch@cs.yale.edu |