From LOJBAN%CUVMB.bitnet@YaleVM.YCC.YALE.EDU Sat Mar 6 22:51:34 2010 Received: from YALEVM.YCC.YALE.EDU by MINERVA.CIS.YALE.EDU via SMTP; Mon, 17 May 1993 15:05:36 -0400 Received: from CUVMB.CC.COLUMBIA.EDU by YaleVM.YCC.Yale.Edu (IBM VM SMTP V2R2) with BSMTP id 5743; Mon, 17 May 93 15:04:55 EDT Received: from CUVMB.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (Mailer R2.07) with BSMTP id 1565; Mon, 17 May 93 15:05:55 EDT Date: Mon, 17 May 1993 13:31:36 -0400 Reply-To: John Cowan Sender: Lojban list From: John Cowan Subject: Two-level self-segregation X-To: conlang , Lojban List To: Erik Rauch Status: RO X-Status: X-From-Space-Date: Mon May 17 09:31:36 1993 X-From-Space-Address: @YaleVM.YCC.YALE.EDU:LOJBAN@CUVMB.BITNET Message-ID: Lojban/Loglan (for my present purposes they are one and the same) has frequently been criticized by conlangistanis for various features of its design, notably its "ugly" morphological rules and its allomorphy. The purpose of this essay is to explain why these features are necessary to achieve Lojban's goals, and indeed are mutually dependent. Self-segregation, the ability to pick words out of a continuous phoneme stream, has often been mentioned on conlang as a useful trait for a constructed language. Some have even said it is mandatory; others have considered it less important than some other goal (such as easy recogniz- ability), but as far as I know no one has deemed it positively harmful. Various languages other than Lojban have been designed for varying degrees of self-segregation: Vorlin and -gua!spi come to mind, and there are very likely others. These two languages self-segregate at what may be called the "word" level; the phoneme stream can be chopped into lexical units in only one way. Without word-level self-segregation, problems like "night rate" vs. "nitrate" can arise, and various kinds of ambiguous sentences are possible. One of Lojban's most important goals is structural unambiguity, and so word-level self-segregation is essential. In the language Voksigid, however, there is no requirement for word-level self-segregation. Instead, some care was taken during the language design to ensure what I will call "morpheme-level self-segregation". This feature of a language requires that words (lexical units) break up into morphemes (meaning units) in only one way. Without morpheme-level self-segregation, the English word "manslaughter" (man-slaughter) could be broken up as "man-s-laughter", something quite different! Early versions of Loglan (until about 1982) did not have morpheme-level self-segregation. The problem here is not unambiguity; a language could be structurally unambiguous if it had no recognizable morphemes below the lexical level at all. The difficulty is a practical one of vocabulary building. Old Loglan compound words were built by assembling fragments of the root words: however, it was not possible to decompose the fragments reliably. This meant that one had to make a search of the entire existing vocabulary to avoid creating a word that was identical to one already existing. The 1982 reforms (which both current Loglan and Lojban share) provided for morpheme-level self-segregation, at a stroke wiping out the entire existing vocabulary of non-root words (Lojban has also redefined the root words as well, for legal non-linguistic reasons), but allowing for unambiguous decomposition of words into morphemes. Vorlin and -gua!spi avoid the problem by making all words mono-morphemic: there is no distinction between a phrasal compound and a compound word. Lojban/Loglan, however, makes such a distinction: phrases have only loosely constrained semantics, whereas compounds have (in principle) single denotations just as root words do. The only way to allow two-level self -segregation was to distinguish between rules for finding the ends of morphemes and the ends of words. Therefore, allomorphy was unavoidable: the bound and the free forms of morphemes had to be distinct. Zipf's law suggested that the bound forms be shorter than the free forms, and this was done. Since there are fewer short forms than long forms, for obvious combinatorial reasons, it was not possible to create fixed rules for mapping between the two: there are about seven possible short forms for every long form, of which at most three are in use (other possibilities typically are pre-empted by some other morpheme). As far as I know, the only other language with two-level self-segregation is Bee (of Plan B), which achieves it by Huffman coding its words and employing phonemes each of which has both a consonantal and an (unrelated) vocalic representation. Bee is only a sketch, however, not a full conlang. I make no claim that the Loglan/Lojban design for two-level allomorphy is the best possible: it was too strongly constrained by the history of the language and the need to retain the benefits of existing work as much as possible. I would be interested in seeing other designs to the same purpose. -- John Cowan cowan@snark.thyrsus.com ...!uunet!lock60!snark!cowan e'osai ko sarji la lojban.