From jay.kominek@colorado.edu Mon Aug 27 10:43:35 2001 Return-Path: X-Sender: kominek@ucsub.colorado.edu X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_3_2); 27 Aug 2001 17:43:35 -0000 Received: (qmail 9447 invoked from network); 27 Aug 2001 17:20:54 -0000 Received: from unknown (10.1.10.27) by l8.egroups.com with QMQP; 27 Aug 2001 17:20:54 -0000 Received: from unknown (HELO ucsub.colorado.edu) (128.138.129.12) by mta2 with SMTP; 27 Aug 2001 17:20:40 -0000 Received: from ucsub.colorado.edu (kominek@ucsub.colorado.edu [128.138.129.12]) by ucsub.colorado.edu (8.11.6/8.11.2/ITS-5.0/student) with ESMTP id f7RHHt327718 for ; Mon, 27 Aug 2001 11:17:55 -0600 (MDT) Date: Mon, 27 Aug 2001 11:17:55 -0600 (MDT) To: Subject: Re: [lojban] LALR1 question In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE From: Jay Kominek X-Yahoo-Message-Num: 10163 On Mon, 27 Aug 2001, And Rosta wrote: > Take the notorious example: > > le broda joi le brode > 1 2 3 4 > > If "1-token lookahead" means that the decision about how to parse > word n can be made on the basis of knowing what word n+1 is, then > how come it won't work in this case? You get to word 3 & think > "selbri linkage or sumti linkage?", so you take a look at word 4, > see it's a sumti and so parse word 3 as sumti linkage. > > What actually happens, apparently, is that when you hit word 3 > you're not allowed to see word 4. Can someone explain this to me, > please? That is, explain what lookahead is. When the parser hits word 3, it is in the middle of building a selbri, and is thinking only about a selbri (in this case). It has two choices, shift or reduce. Shifting joi onto the stack is what is causing the current problem, you get (le (broda joi)), then the parser looks at word 4, and the contents of the stack it has built so far, and goes 'oh shit, i shifted the wrong thing.', so it explodes. The action that you want, in this case, is for the parser to reduce (instead of shift), and turn the current stack contents (le (broda)) into a sumti. Then it will reexamine joi and decide that it will be connecting two sumti instead of two selbri. (However, YACC always shifts instead of reducing if there is a conflict.) Without back-tracking or more look ahead, an *LR parser can't do that, though. Hrm. A parser with back tracking or more look ahead (or basically any parser which can parse a larger class of languages) could take the same YACC grammar and be able to parse "le broda joi le brode" correctly... I think... (You seem to have the idea of lookahead right, in that you look at tokens up to n+1 to decide what n is, you're just missing out on the way the parsing represents the partially-parsed data.) - Jay Kominek Plus =C3=A7a change, plus c'est la m=C3=AAme chose