From cbmvax!uunet!cuvma.bitnet!LOJBAN Mon Mar 2 18:56:58 1992 Return-Path: Received: by snark.thyrsus.com (/\==/\ Smail3.1.21.1 #21.19) id ; Mon, 2 Mar 92 18:56 EST Received: by cbmvax.cbm.commodore.com (5.57/UUCP-Project/Commodore 2/8/91) id AA29916; Mon, 2 Mar 92 14:53:44 EST Received: from rutgers.edu by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA02141; Mon, 2 Mar 92 14:51:22 -0500 Received: from cbmvax.UUCP by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) with UUCP id AA12103; Mon, 2 Mar 92 13:22:43 EST Received: by cbmvax.cbm.commodore.com (5.57/UUCP-Project/Commodore 2/8/91) id AA12754; Mon, 2 Mar 92 13:06:48 EST Received: from CUVMB.COLUMBIA.EDU (via uunet.UU.NET) by relay2.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA09793; Mon, 2 Mar 92 12:38:40 -0500 Message-Id: <9203021738.AA09793@relay2.UU.NET> Received: from CUVMB.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2.1) with BSMTP id 1704; Mon, 02 Mar 92 12:37:10 EST Received: by CUVMB (Mailer R2.07) id 5416; Mon, 02 Mar 92 12:34:11 EST Date: Mon, 2 Mar 1992 11:48:14 GMT Reply-To: CJ FINE Sender: Lojban list From: CJ FINE Subject: Re: morphology X-To: fschulz@pyramid.com X-Cc: Lojban list To: John Cowan In-Reply-To: <9202292323.AA07581@pyrps5.eng.pyramid.com>; from "fschulz@com.pyramid" at Feb 29, 92 3:23 pm Status: RO Frank continues the discussion: (Lojbab answered much of the following in a mail beginning "I will let Colin answer most of your questions!") > > > I will reply to lojbab and kolin with one reply since it looks > like the message I received went to both. > > lojbab says my morphology is ambiguous and does not distinguish > tanru and lujvo. > I do not understand the distinction between tanru and lujvo. > What are the differences? I thought lujvo were just compressed tanru. They are, but the "compression" has as a requirement that lujvo stick together as a single word. There are at least three reasons why the distinction between fraso prenu (tanru) and frasyprenu/fasprenu/frasypre/faspre (lujvo) is important: 1) a tanru has (by definition the place structure of its last term - a lujvo may have a different place structure 2) in a more complex tanru, a sequence of brivla may not even be a tanru (eg "carmi fraso prenu" parses as "[carmi fraso] prenu" and does not contain "fraso prenu" as a constituent at all. "carmi faspre" does, obviously, contain "faspre"). 3) words like "ba'e" and "zo", which operate on a valsi (a single lojban word) pick up one brivla, whether it is a gismu or a lujvo. They will not pick up a tanru. In Loglan before the GMR, gismu and lujvo were indistinguishable by their form - this turned out to be unworkable, hence the changes. > lojbab writes > la pier laplas. tadni lo cmaci > In the written form is > la pier. laplas. > necessary or is the pause assumed by default? Or is there an ambiguity > here which is resolved by knowing the name? I see this as > Pierrelaplace > Machine translation will need a special lookup here anyway, so this > ambiguity might be harmless. "pier" and "lyples" are valid Lojban cmene, and so is "pierlyples". It is a matter of choice (the se cmene or the te cmene) which is used. Each name must be followed by a pause in speech - in writing this is optional (though recommended). "laplas" is *not* a valid cmene as it contains the syllable "la" (twice). Under the new not-yet-baselined proposal, however "pierlaplas" will be valid, as each "la" is preceded by a consonant. > > lojbab mentioned that other morphology structures have been proposed. > If these are real simple I would like to see them. I would prefer > to see a morphology which is simple to understand and has a serious > flaw than one which is correct and impossible for me to understand. > My idea is to look at several morphology types with different kinds > of flaws to sneak up on the lojban morphology. Of course the flaw > must be explicitly mentioned, so the flaw is not mistaken for an > oversight. The whole question of "simplicity" of morphology needs to be teased apart. There are three separate parts to it: 1) How easy is it for a hearer/reader to parse the speech stream? 2) How robust is it - ie are errors likely to be recognised or to be misparsed as something else? 3) How easy is it for the word-coiner to apply ? The first is much the most important - and on this score, Lojban morphology is simple: Divide the speech-stream into pause groups If a group ends with a consonant, it ends with a cmene. Otherwise, if it contains a consonant cluster (possibly buffered with "y"), it contains a brivla Otherwise it's made up of cmavo. Then three rules for finding the boundaries of these: A cmene starts after the last "la", "lai", "la'i" or "doi", or at the start of the pause group; A brivla starts at the first cluster if that is a permissible initial, or the previous consonant otherwise; A brivla ends after the next vowel after the stress. The rules you need as coiner are more complicated, it is true. And the problem of robustness is why I strongly favour the limitation on the structure of le'avla that was proposed a while ago (and is not yet official, I think?) > > When I finally understand the morphology I would like to > to verify the lojban morphology is not ambiguous using formal > verification techniques and write > computer code to test words for morphological correctness. > This would the the morphology analog of a spelling checker. > I suspect this would turn up a lot of errors. This assumes > that this has not yet been done. I believe it has, but I am not sure. > > kolin mentions that my use of the term gismu in my toy morphology > description is not standard. This is correct. Is there some cmavo or > gismu that means metaphorical? This should have been prefixed to > gismu. I did not want to coin new vocabulary so I used the words > incorrectly. What I intended was something that had the functional > properties and behavior of a gismu. Does a generalizer cmavo exist? > This would express my intention better. "pe'a"/"po'a" marks a metaphorical expression (only a tanru, I think, but I am not sure of the grammar. But I don't think that was my point - lojban has the following classification of valsi: cmene brivla cmavo At the syntactic level, these are the only categories. *derivationally* (and, by design, morphologically), we can then subdivide brivla into gismu lujvo le'avla but syntactically they are identical kolin c.j.fine@bradford.ac.uk