From jay.kominek@colorado.edu Mon Aug 27 10:43:35 2001
Return-Path: <kominek@ucsub.colorado.edu>
X-Sender: kominek@ucsub.colorado.edu
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-7_3_2); 27 Aug 2001 17:43:35 -0000
Received: (qmail 9447 invoked from network); 27 Aug 2001 17:20:54 -0000
Received: from unknown (10.1.10.27)
  by l8.egroups.com with QMQP; 27 Aug 2001 17:20:54 -0000
Received: from unknown (HELO ucsub.colorado.edu) (128.138.129.12)
  by mta2 with SMTP; 27 Aug 2001 17:20:40 -0000
Received: from ucsub.colorado.edu (kominek@ucsub.colorado.edu [128.138.129.12])
	by ucsub.colorado.edu (8.11.6/8.11.2/ITS-5.0/student) with ESMTP id f7RHHt327718
	for <lojban@yahoogroups.com>; Mon, 27 Aug 2001 11:17:55 -0600 (MDT)
Date: Mon, 27 Aug 2001 11:17:55 -0600 (MDT)
To: <lojban@yahoogroups.com>
Subject: Re: [lojban] LALR1 question
In-Reply-To: <LPBBJKMNINKHACNDIIGMMEALEKAA.a.rosta@ntlworld.com>
Message-ID: <Pine.GSO.4.33.0108271059530.15411-100000@ucsub.colorado.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
From: Jay Kominek <jay.kominek@colorado.edu>
X-Yahoo-Message-Num: 10163


On Mon, 27 Aug 2001, And Rosta wrote:

> Take the notorious example:
>
> le broda joi le brode
> 1  2     3   4
>
> If "1-token lookahead" means that the decision about how to parse
> word n can be made on the basis of knowing what word n+1 is, then
> how come it won't work in this case? You get to word 3 & think
> "selbri linkage or sumti linkage?", so you take a look at word 4,
> see it's a sumti and so parse word 3 as sumti linkage.
>
> What actually happens, apparently, is that when you hit word 3
> you're not allowed to see word 4. Can someone explain this to me,
> please? That is, explain what lookahead is.

When the parser hits word 3, it is in the middle of building a selbri, and
is thinking only about a selbri (in this case). It has two choices, shift
or reduce. Shifting joi onto the stack is what is causing the current
problem, you get (le (broda joi)), then the parser looks at word 4, and
the contents of the stack it has built so far, and goes 'oh shit, i
shifted the wrong thing.', so it explodes.

The action that you want, in this case, is for the parser to reduce
(instead of shift), and turn the current stack contents (le (broda)) into
a sumti. Then it will reexamine joi and decide that it will be connecting
two sumti instead of two selbri. (However, YACC always shifts instead of
reducing if there is a conflict.)

Without back-tracking or more look ahead, an *LR parser can't do that,
though.

Hrm. A parser with back tracking or more look ahead (or basically any
parser which can parse a larger class of languages) could take the same
YACC grammar and be able to parse "le broda joi le brode" correctly... I
think...

(You seem to have the idea of lookahead right, in that you look at tokens
up to n+1 to decide what n is, you're just missing out on the way the
parsing represents the partially-parsed data.)

- Jay Kominek <jay.kominek@colorado.edu>
  Plus =C3=A7a change, plus c'est la m=C3=AAme chose