From LOJBAN%CUVMB.bitnet@YaleVM.YCC.YALE.EDU  Sat Mar  6 22:51:34 2010
Received: from YALEVM.YCC.YALE.EDU by MINERVA.CIS.YALE.EDU via SMTP; Mon, 17 May 1993 15:05:36 -0400
Received: from CUVMB.CC.COLUMBIA.EDU by YaleVM.YCC.Yale.Edu (IBM VM SMTP V2R2)    with BSMTP id 5743; Mon, 17 May 93 15:04:55 EDT
Received: from CUVMB.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (Mailer R2.07) with  BSMTP id 1565; Mon, 17 May 93 15:05:55 EDT
Date:         Mon, 17 May 1993 13:31:36 -0400
Reply-To: John Cowan <cowan@SNARK.THYRSUS.COM>
Sender: Lojban list <LOJBAN%CUVMB.bitnet@YaleVM.YCC.YALE.EDU>
From: John Cowan <cowan@SNARK.THYRSUS.COM>
Subject:      Two-level self-segregation
X-To:         conlang <conlang@diku.dk>,               Lojban List <lojban@cuvmb.cc.columbia.edu>
To: Erik Rauch <erikr@MINERVA.CIS.YALE.EDU>
Status: RO
X-Status: 
X-From-Space-Date: Mon May 17 09:31:36 1993
X-From-Space-Address: @YaleVM.YCC.YALE.EDU:LOJBAN@CUVMB.BITNET
Message-ID: <tl_oPQw9SuD.A.X1H.2z0kLB@chain.digitalkingdom.org>

Lojban/Loglan (for my present purposes they are one and the same) has
frequently been criticized by conlangistanis for various features of its
design, notably its "ugly" morphological rules and its allomorphy.
The purpose of this essay is to explain why these features are necessary
to achieve Lojban's goals, and indeed are mutually dependent.

Self-segregation, the ability to pick words out of a continuous phoneme
stream, has often been mentioned on conlang as a useful trait for a
constructed language.  Some have even said it is mandatory; others have
considered it less important than some other goal (such as easy recogniz-
ability), but as far as I know no one has deemed it positively harmful.

Various languages other than Lojban have been designed for varying degrees
of self-segregation: Vorlin and -gua!spi come to mind, and there are very
likely others.  These two languages self-segregate at what may be called
the "word" level; the phoneme stream can be chopped into lexical units in
only one way.  Without word-level self-segregation, problems like
"night rate" vs. "nitrate" can arise, and various kinds of ambiguous
sentences are possible.  One of Lojban's most important goals is structural
unambiguity, and so word-level self-segregation is essential.

In the language Voksigid, however, there is no requirement for word-level
self-segregation.  Instead, some care was taken during the language design
to ensure what I will call "morpheme-level self-segregation".  This feature
of a language requires that words (lexical units) break up into morphemes
(meaning units) in only one way.  Without morpheme-level self-segregation,
the English word "manslaughter" (man-slaughter) could be broken up as
"man-s-laughter", something quite different!

Early versions of Loglan (until about 1982) did not have morpheme-level
self-segregation.  The problem here is not unambiguity; a language could be
structurally unambiguous if it had no recognizable morphemes below the
lexical level at all.  The difficulty is a practical one of vocabulary
building.  Old Loglan compound words were built by assembling fragments of
the root words: however, it was not possible to decompose the fragments
reliably.  This meant that one had to make a search of the entire existing
vocabulary to avoid creating a word that was identical to one already
existing.

The 1982 reforms (which both current Loglan and Lojban share) provided for
morpheme-level self-segregation, at a stroke wiping out the entire existing
vocabulary of non-root words (Lojban has also redefined the root
words as well, for legal non-linguistic reasons), but allowing for
unambiguous decomposition of words into morphemes.

Vorlin and -gua!spi avoid the problem by making all words mono-morphemic:
there is no distinction between a phrasal compound and a compound word.
Lojban/Loglan, however, makes such a distinction: phrases have only loosely
constrained semantics, whereas compounds have (in principle) single
denotations just as root words do.  The only way to allow two-level self
-segregation was to distinguish between rules for finding the ends of morphemes
and the ends of words.  Therefore, allomorphy was unavoidable: the bound and
the free forms of morphemes had to be distinct.  Zipf's law suggested that
the bound forms be shorter than the free forms, and this was done.  Since
there are fewer short forms than long forms, for obvious combinatorial
reasons, it was not possible to create fixed rules for mapping between
the two: there are about seven possible short forms for every long form,
of which at most three are in use (other possibilities typically are
pre-empted by some other morpheme).

As far as I know, the only other language with two-level self-segregation
is Bee (of Plan B), which achieves it by Huffman coding its words and employing
phonemes each of which has both a consonantal and an (unrelated) vocalic
representation.  Bee is only a sketch, however, not a full conlang.

I make no claim that the Loglan/Lojban design for two-level allomorphy is
the best possible: it was too strongly constrained by the history of the
language and the need to retain the benefits of existing work as much as
possible.  I would be interested in seeing other designs to the same
purpose.

--
John Cowan      cowan@snark.thyrsus.com         ...!uunet!lock60!snark!cowan
                        e'osai ko sarji la lojban.