Received: from localhost ([::1]:44574 helo=stodi.digitalkingdom.org) by stodi.digitalkingdom.org with esmtp (Exim 4.76) (envelope-from ) id 1U0M7U-0000rI-Qr; Tue, 29 Jan 2013 17:05:44 -0800 Received: from mail-pa0-f43.google.com ([209.85.220.43]:40420) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1U0M7K-0000rC-Kl for jbovlaste@lojban.org; Tue, 29 Jan 2013 17:05:42 -0800 Received: by mail-pa0-f43.google.com with SMTP id fb10so739052pad.30 for ; Tue, 29 Jan 2013 17:05:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=MG7F6ifeUn/qCsCh/oYxMNZioZk2M0RAhFllPavzbC8=; b=fJIYvXHnjd8BKtM+ph0/Z/6hGYfXAGIjMwEyk87LxU5XzfzwqgVpycrejyv+vl7RzU tGGmevdfbu6vfmBIoTYuFSzV42gIi5KDVwheKWigZpPuo4bgopBuHWyPW/Etv2DDx2J3 l4NSNarxa6Nt3h3jeGFyD45g2wpFUoAVdILZPVN/GWLeKwScsqPd7rC5dDZzX1FgCS+p tSduj8LzB1nhsjtfzAeghT08yM7Zn2HitbnqWDtzmCfhO9tDrfj0qln5aCujc/qlPlvn Hjdt3aXCCbSqpca1pn/oBAGiIjbLs0oCvXeAgR60HzdJJEJk5rmF5ylFu16EI04Od24z V1mQ== MIME-Version: 1.0 X-Received: by 10.68.211.42 with SMTP id mz10mr7438196pbc.100.1359507926670; Tue, 29 Jan 2013 17:05:26 -0800 (PST) Received: by 10.66.233.225 with HTTP; Tue, 29 Jan 2013 17:05:26 -0800 (PST) In-Reply-To: <20130130001016.GG16924@mercury.ccil.org> References: <20130124175134.GA14317@mercury.ccil.org> <51017FF7.504@plasmatix.com> <20130124221349.GB20636@mercury.ccil.org> <20130125151703.GB20813@mercury.ccil.org> <20130126232527.GG13680@mercury.ccil.org> <20130130001016.GG16924@mercury.ccil.org> Date: Tue, 29 Jan 2013 22:05:26 -0300 Message-ID: From: =?ISO-8859-1?Q?Jorge_Llamb=EDas?= To: jbovlaste@lojban.org X-Spam-Score: -0.1 (/) X-Spam_score: -0.1 X-Spam_score_int: 0 X-Spam_bar: / Subject: Re: [jbovlaste] berbere, berberi X-BeenThere: jbovlaste@lojban.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: jbovlaste@lojban.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: jbovlaste-bounces@lojban.org Content-Length: 3226 On Tue, Jan 29, 2013 at 9:10 PM, John Cowan wrote: > Jorge Llamb=EDas scripsit: > >> Since we don't need to detect LALR-n-ambiguity anyway, why would >> this limitation of a PEG make it not good enough to parse the Lojban >> morphology? > > Let me use a greatly oversimplified example. Suppose we are writing a > morphology program to parse a word into a sequence of morphemes. > We define a morpheme as having the form V, CV, or CVn, where V and C > are any vowel and any consonant respectively. If C does not include n, > this grammar is obviously unambiguous, as there is only one way to parse > any valid word into a sequence of morphemes. If C does include n, this > grammar is obviously ambiguous: we do not know if "jana" parses as "jan a" > or "ja na". > > Now if we write a YACC grammar for the latter case, like this: > > C : 'j' | 'k' | 'l' | 'm' | 'n'; > V : 'a' | 'e' | 'i' | 'o' | 'u'; > morpheme: V | C V | C V 'n'; > word : morpheme | word morpheme; > > Yacc will tell us that there is a shift-reduce error. This reflects > the fact that the grammar is ambiguous, and therefore unsuited for a > Lojban-style language. > > But if we write a PEG grammar, But we cannot do that for that language! It's simply impossible to write an ambiguous PEG grammar. > we will not get a complaint: it will be all > about whether the morpheme rule is written as C V 'n' / C V / V (which > will prefer the parse "jan a") and therefore does not correspond to your ambiguous language. > or C V / C V 'n' / V, (which will prefer > the parse "ja na"). and therefore also does not match your ambiguous language. Either of those two PEG grammars would be suitable for a language like Lojban, unlike the third grammar flagged as ambiguous by Yacc. PEG will never even find that unsuitable third grammar. > It is in this sense that a PEG grammar is unsuitable > for Lojban: precisely because the PEG grammar settles all ambiguities in > advance, we cannot be sure that the text has only one possible analysis. But the text does have one possible analysis for each of the two PEG grammars: "jan a" for one of the grammars and "ja na" for the other grammar. They are two different grammars, each unambiguous. One of them could be a language like Lojban. The ambiguous third language that PEG cannot handle could never be Lojban anyway, so why should we care that PEG cannot represent it? If the Lojban morphology is defined by a PEG grammar, it is unambiguous. There's nothing unsuitable about that. > The only way to be sure is to put each alternation rule in the PEG into > every possible order, and make sure that all texts parse the same way > with all the variants. That's easy to do. You just replace every (A / B) by its equivalent (A | !A B). It doesn't matter in which order you test A and !A B because at most only one can ever succeed. The disadvantage of doing this is that the parsing is more inefficient, but if you don't care about efficiency it doesn't make a difference, and now the rules can be applied in any order. mu'o mi'e xorxes _______________________________________________ jbovlaste mailing list jbovlaste@lojban.org http://mail.lojban.org/mailman/listinfo/jbovlaste