Message-Id: <199602101623.LAA13910@locke.ccil.org>
Date:         Sat, 10 Feb 1996 10:49:34 -0500
Reply-To: Logical Language Group <lojbab@ACCESS.DIGEX.NET>
Sender: Lojban list <LOJBAN@CUVMB.BITNET>
From: Logical Language Group <lojbab@ACCESS.DIGEX.NET>
Subject:      *old response on LR(K)
To: John Cowan <cowan@LOCKE.CCIL.ORG>
Status: OR
Content-Length: 3244

From: Chris Bogart <cbogart@QUETZAL.COM>
Subject:      Re: LR(k) Lojban Grammar

>chris:
>>>>eventually knock wood there'll
>>>>be academic interest in the language and we'll want to be able to
>>>>define it in a more theoretically understandable way.  So it ought to
>>>>be at least a long-term goal.
>
>>This assumes that lojban has an LR(k) grammar...
>
>Well, if ly. has a non-LR(k) grammar, that would be slow to parse using
>a generic yacc-like tool for whatever class of grammar it has, then we
>don't want to use that parser day-in and day-out.  But we'll still want
>to do that work for theoretical reasons.

The main problem you have is in proving that any given grammar is
provable identical to the official grammar, something that I am sure
would be necessary for theoretical work.  As it is, that is one reason I
pay no attention to the E-BNF - we have no way to prove that IT is the
same as the YACC grammar - only Cowan's admitted intricate knowledge of
the grammar and YACC parsing.  But that isn't going to be good enough
for academics.

>>I can't find the
>>reference right now, but the last I remember was that the grammar
>>was not actually completely context-free -- there were some shift-reduce
>>conflicts
>
>Do shift-reduce conflicts mean the language is not context-free?  Or do
>they mean the language is not LR(1)?

There are no s/r conflicts permitted in any official version of the
grammar.  The main problem we have right now is that the 2.33 parser
from a couple of years back works fine, changes 2.34 and 2.35 have been
approved and a parser was generated (and is probably the one used by
most people, though I never switched), but has a known problem that some
Lojban texts cause it to blow up which Cowan has had no time to debug.
Alas, we may not have any copies of the 2.33 parser source for doing
diffs, so debugging is all that much harder.  Cowan has successfully
YACCed all proposed changes through change proposal 40, though they have
not all been integrated into one whole, nor a parser built and tested.

The last grammar to be dummy checked was something like 2.08.  Dummy
checking involves the rather time consuming job of setting up numbers
for the random sentence generator, seeing what it generates, and if
there is anything uncertain, verifying that all generated texts parse in
the same form they were generated.  We have not found errors by dummy
checking since before the grammar baseline.

>>...that YACC resolved using precedence of rules/productions.
>>If S-R or R-R conflicts exist, the language is not LR(1), regardless
>>of whether or not YACC can parse it; syntactic ambiguity exists,
>>it's just hidden by the mechanics of the parser.

In addition, all of the lexer stuff in rules numbered over 900 are
shielded from the rest of the grammar using the invisible lexer-selma'o.
This in effect makes the YACC parser a 2+ stage parser, where all stuff
at the lexer level is resolved and tagged, and THEN the main grammar
must parse according to LR(1) with error correction.

>If the parser is the official language definition, then the mechanics of
>the parser are part of the definition as well.

This is true, and why we include the lexer algorithm as part of the YACC
grammar.

lojbab