From cowan@ccil.org Mon Aug 27 20:34:33 2001
Return-Path: <cowan@mercury.ccil.org>
X-Sender: cowan@mercury.ccil.org
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-7_3_2); 28 Aug 2001 03:34:33 -0000
Received: (qmail 51277 invoked from network); 28 Aug 2001 03:34:32 -0000
Received: from unknown (10.1.10.142)
  by l7.egroups.com with QMQP; 28 Aug 2001 03:34:32 -0000
Received: from unknown (HELO mercury.ccil.org) (192.190.237.100)
  by mta3 with SMTP; 28 Aug 2001 03:34:32 -0000
Received: from cowan by mercury.ccil.org with local (Exim 3.12 #1 (Debian))
  id 15bZeF-0000o1-00; Mon, 27 Aug 2001 23:34:43 -0400
Subject: Re: [lojban] LALR1 question
In-Reply-To: <LPBBJKMNINKHACNDIIGMAEBJEKAA.a.rosta@dtn.ntl.com> from And Rosta
  at "Aug 28, 2001 03:07:26 am"
To: a.rosta@ntlworld.com
Date: Mon, 27 Aug 2001 23:34:43 -0400 (EDT)
Cc: lojban@yahoogroups.com
X-Mailer: ELM [version 2.4ME+ PL66 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-Id: <E15bZeF-0000o1-00@mercury.ccil.org>
X-eGroups-From: John Cowan <cowan@mercury.ccil.org>
From: John Cowan <cowan@ccil.org>

And Rosta scripsit:

> So "shifting" means "adding to the innermost incomplete phrase" and
> "reducing" means "terminating the innermost incomplete phrase"? So
> presumably on reaching a given word it is possible to reduce several
> times and then shift?

Just so. But not all reductions that can be made are made; the
lookahead token is also considered.

COnsider the following slightly lunatic grammar, where NP is the start:

NP <- NP1
NP <- NP1 'and' NP
NP1 <- 'also' NP 'as:well'
NP1 <- NP1 'so:they:say'
NP1 <- Name

where Name is a terminal. So here are some grammatical strings:

John
John, so they say
John and James
John, so they say, and James
John, so they say, so they say, and also James and George as well
also John and James as well, and also George and Frank, so they say, as well

Now consider the string "John and James so:they:say". "John" will be shifted
onto the stack; likewise with "and". Now "James" is the current token.
What to do? We could reduce "John and James" to NP, but that is wrong:
if we do so, we end up shifting "so:they:say" and then failing, because
"NP so:they:say" is not the RHS of any rule. We have proved that this grammar
is not LR(0).

Instead, we must shift "James", then shift "so:they:say", then reduce the two
of them to "NP1", and then "John and NP1" to "NP", which is the start symbol. 
(I am skipping obvious steps like reducing names to NP1s.)

> If I understand this right, then, what I still don't understand is this.
> As you say, at word 3, the parser has two choices, shift or reduce.
> Now it seems to me that at this point it's not allowed to look at word
> 4 [...]

That's not really the problem in this case. The problem arises
because elidable terminators are only inserted into strings that
would otherwise be ungrammatical, and only if the error is detected
at the right point. So the grammar of descriptions is

Descr <- LE sumti KU
Descr <- LE sumti ERROR

where ERROR is a token introduced by the parser when it can neither
shift nor reduce. But in

le broda joi le ...

"le" is shifted, so is "broda", and so is "joi". Then the next word
must be a brivla (or equivalent construct), and "le" is an error.
This does not happen with

le broda .e le ...

because as soon as ".e" is seen, an error results, and the parser
recovers by inserting "ku" (followed by shift and reduce to "description").
But "joi" is legal within a description, unlike ".e", so the opportunity
to insert an elidable terminator is irrevocably lost.

-- 
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
Please leave your values | Check your assumptions. In fact,
at the front desk. | check your assumptions at the door.
--sign in Paris hotel | --Miles Vorkosigan