Received: from wnt.dc.lsoft.com (wnt.dc.lsoft.com [205.186.43.7]) by locke.ccil.org (8.6.9/8.6.10) with ESMTP id LAA13910 for ; Sat, 10 Feb 1996 11:23:12 -0500 Message-Id: <199602101623.LAA13910@locke.ccil.org> Received: from PEACH.EASE.LSOFT.COM (205.186.43.4) by wnt.dc.lsoft.com (LSMTP for Windows NT v1.0a) with SMTP id 4ABD89B0 ; Sat, 10 Feb 1996 10:48:20 -0500 Date: Sat, 10 Feb 1996 10:49:34 -0500 Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group Subject: *old response on LR(K) X-To: lojban@cuvmb.cc.columbia.edu To: John Cowan Status: OR X-Mozilla-Status: 0001 Content-Length: 3244 X-From-Space-Date: Sat Feb 10 11:23:17 1996 X-From-Space-Address: LOJBAN%CUVMB.BITNET@UBVM.CC.BUFFALO.EDU From: Chris Bogart Subject: Re: LR(k) Lojban Grammar >chris: >>>>eventually knock wood there'll >>>>be academic interest in the language and we'll want to be able to >>>>define it in a more theoretically understandable way. So it ought to >>>>be at least a long-term goal. > >>This assumes that lojban has an LR(k) grammar... > >Well, if ly. has a non-LR(k) grammar, that would be slow to parse using >a generic yacc-like tool for whatever class of grammar it has, then we >don't want to use that parser day-in and day-out. But we'll still want >to do that work for theoretical reasons. The main problem you have is in proving that any given grammar is provable identical to the official grammar, something that I am sure would be necessary for theoretical work. As it is, that is one reason I pay no attention to the E-BNF - we have no way to prove that IT is the same as the YACC grammar - only Cowan's admitted intricate knowledge of the grammar and YACC parsing. But that isn't going to be good enough for academics. >>I can't find the >>reference right now, but the last I remember was that the grammar >>was not actually completely context-free -- there were some shift-reduce >>conflicts > >Do shift-reduce conflicts mean the language is not context-free? Or do >they mean the language is not LR(1)? There are no s/r conflicts permitted in any official version of the grammar. The main problem we have right now is that the 2.33 parser from a couple of years back works fine, changes 2.34 and 2.35 have been approved and a parser was generated (and is probably the one used by most people, though I never switched), but has a known problem that some Lojban texts cause it to blow up which Cowan has had no time to debug. Alas, we may not have any copies of the 2.33 parser source for doing diffs, so debugging is all that much harder. Cowan has successfully YACCed all proposed changes through change proposal 40, though they have not all been integrated into one whole, nor a parser built and tested. The last grammar to be dummy checked was something like 2.08. Dummy checking involves the rather time consuming job of setting up numbers for the random sentence generator, seeing what it generates, and if there is anything uncertain, verifying that all generated texts parse in the same form they were generated. We have not found errors by dummy checking since before the grammar baseline. >>...that YACC resolved using precedence of rules/productions. >>If S-R or R-R conflicts exist, the language is not LR(1), regardless >>of whether or not YACC can parse it; syntactic ambiguity exists, >>it's just hidden by the mechanics of the parser. In addition, all of the lexer stuff in rules numbered over 900 are shielded from the rest of the grammar using the invisible lexer-selma'o. This in effect makes the YACC parser a 2+ stage parser, where all stuff at the lexer level is resolved and tagged, and THEN the main grammar must parse according to LR(1) with error correction. >If the parser is the official language definition, then the mechanics of >the parser are part of the definition as well. This is true, and why we include the lexer algorithm as part of the YACC grammar. lojbab