MIME-Version: 1.0
In-Reply-To: <20130130001016.GG16924@mercury.ccil.org>
References: <CAO7bV+gfG2RK+vJLb2WLVA0VGeD3HhXX=Y3cwvq-G6AefwpT4g@mail.gmail.com>
 <20130124175134.GA14317@mercury.ccil.org>
 <51017FF7.504@plasmatix.com>
 <20130124221349.GB20636@mercury.ccil.org>
 <CAO7tK2dx4ps9J1n8o+qRiWb2mPJzR=SEvBDGMHkaO90RseXkEQ@mail.gmail.com>
 <20130125151703.GB20813@mercury.ccil.org>
 <CAO7tK2coVb2JagQOpUm-P211U8b_045dL8DGb+QTJBNAZiiLFg@mail.gmail.com>
 <20130126232527.GG13680@mercury.ccil.org>
 <CAO7tK2df_wDMFqwY_0uQdG1xUvojB4Btd2BW2G=GBZHtSDdxRQ@mail.gmail.com>
 <20130130001016.GG16924@mercury.ccil.org>
Date: Tue, 29 Jan 2013 22:05:26 -0300
Message-ID: <CAO7tK2eOqCnyZ2gYBR05r=93dDx5=74_q9OahWvZF1Gnsb76Gg@mail.gmail.com>
From: =?ISO-8859-1?Q?Jorge_Llamb=EDas?= <jjllambias@gmail.com>
To: jbovlaste@lojban.org
X-Spam_score: -0.1
X-Spam_score_int: 0
X-Spam_bar: /
Subject: Re: [jbovlaste] berbere, berberi
X-BeenThere: jbovlaste@lojban.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: jbovlaste@lojban.org
List-Id: <jbovlaste.lojban.org>
List-Unsubscribe: <http://mail.lojban.org/mailman/options/jbovlaste>,
 <mailto:jbovlaste-request@lojban.org?subject=unsubscribe>
List-Archive: <http://mail.lojban.org/mailman/private/jbovlaste/>
List-Post: <mailto:jbovlaste@lojban.org>
List-Help: <mailto:jbovlaste-request@lojban.org?subject=help>
List-Subscribe: <http://mail.lojban.org/mailman/listinfo/jbovlaste>,
 <mailto:jbovlaste-request@lojban.org?subject=subscribe>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Errors-To: jbovlaste-bounces@lojban.org
Content-Length: 3226

On Tue, Jan 29, 2013 at 9:10 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> Jorge Llamb=EDas scripsit:
>
>> Since we don't need to detect LALR-n-ambiguity anyway, why would
>> this limitation of a PEG make it not good enough to parse the Lojban
>> morphology?
>
> Let me use a greatly oversimplified example.  Suppose we are writing a
> morphology program to parse a word into a sequence of morphemes.
> We define a morpheme as having the form V, CV, or CVn, where V and C
> are any vowel and any consonant respectively.  If C does not include n,
> this grammar is obviously unambiguous, as there is only one way to parse
> any valid word into a sequence of morphemes.  If C does include n, this
> grammar is obviously ambiguous: we do not know if "jana" parses as "jan a"
> or "ja na".
>
> Now if we write a YACC grammar for the latter case, like this:
>
> C : 'j' | 'k' | 'l' | 'm' | 'n';
> V : 'a' | 'e' | 'i' | 'o' | 'u';
> morpheme: V | C V | C V 'n';
> word : morpheme | word morpheme;
>
> Yacc will tell us that there is a shift-reduce error.  This reflects
> the fact that the grammar is ambiguous, and therefore unsuited for a
> Lojban-style language.
>
> But if we write a PEG grammar,

But we cannot do that for that language! It's simply impossible to
write an ambiguous PEG grammar.

> we will not get a complaint: it will be all
> about whether the morpheme rule is written as C V 'n' / C V / V (which
> will prefer the parse "jan a")

and therefore does not correspond to your ambiguous language.

> or C V / C V 'n' / V, (which will prefer
> the parse "ja na").

and therefore also does not match your ambiguous language. Either of
those two PEG grammars would be suitable for a language like Lojban,
unlike the third grammar flagged as ambiguous by Yacc. PEG will never
even find that unsuitable third grammar.

> It is in this sense that a PEG grammar is unsuitable
> for Lojban: precisely because the PEG grammar settles all ambiguities in
> advance, we cannot be sure that the text has only one possible analysis.

But the text does have one possible analysis for each of the two PEG
grammars: "jan a" for one of the grammars and "ja na" for the other
grammar. They are two different grammars, each unambiguous. One of
them could be a language like Lojban. The ambiguous third language
that PEG cannot handle could never be Lojban anyway, so why should we
care that PEG cannot represent it?

If the Lojban morphology is defined by a PEG grammar, it is
unambiguous. There's nothing unsuitable about that.

> The only way to be sure is to put each alternation rule in the PEG into
> every possible order, and make sure that all texts parse the same way
> with all the variants.

That's easy to do. You just replace every  (A / B) by its equivalent
(A | !A B). It doesn't matter in which order you test A and !A B
because at most only one can ever succeed. The disadvantage of doing
this is that the parsing is more inefficient, but if you don't care
about efficiency it doesn't make a difference, and now the rules can
be applied in any order.

mu'o mi'e xorxes

_______________________________________________
jbovlaste mailing list
jbovlaste@lojban.org
http://mail.lojban.org/mailman/listinfo/jbovlaste