From cowan@ccil.org Mon Aug 27 20:34:33 2001 Return-Path: X-Sender: cowan@mercury.ccil.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_3_2); 28 Aug 2001 03:34:33 -0000 Received: (qmail 51277 invoked from network); 28 Aug 2001 03:34:32 -0000 Received: from unknown (10.1.10.142) by l7.egroups.com with QMQP; 28 Aug 2001 03:34:32 -0000 Received: from unknown (HELO mercury.ccil.org) (192.190.237.100) by mta3 with SMTP; 28 Aug 2001 03:34:32 -0000 Received: from cowan by mercury.ccil.org with local (Exim 3.12 #1 (Debian)) id 15bZeF-0000o1-00; Mon, 27 Aug 2001 23:34:43 -0400 Subject: Re: [lojban] LALR1 question In-Reply-To: from And Rosta at "Aug 28, 2001 03:07:26 am" To: a.rosta@ntlworld.com Date: Mon, 27 Aug 2001 23:34:43 -0400 (EDT) Cc: lojban@yahoogroups.com X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: X-eGroups-From: John Cowan From: John Cowan X-Yahoo-Message-Num: 10205 And Rosta scripsit: > So "shifting" means "adding to the innermost incomplete phrase" and > "reducing" means "terminating the innermost incomplete phrase"? So > presumably on reaching a given word it is possible to reduce several > times and then shift? Just so. But not all reductions that can be made are made; the lookahead token is also considered. COnsider the following slightly lunatic grammar, where NP is the start: NP <- NP1 NP <- NP1 'and' NP NP1 <- 'also' NP 'as:well' NP1 <- NP1 'so:they:say' NP1 <- Name where Name is a terminal. So here are some grammatical strings: John John, so they say John and James John, so they say, and James John, so they say, so they say, and also James and George as well also John and James as well, and also George and Frank, so they say, as well Now consider the string "John and James so:they:say". "John" will be shifted onto the stack; likewise with "and". Now "James" is the current token. What to do? We could reduce "John and James" to NP, but that is wrong: if we do so, we end up shifting "so:they:say" and then failing, because "NP so:they:say" is not the RHS of any rule. We have proved that this grammar is not LR(0). Instead, we must shift "James", then shift "so:they:say", then reduce the two of them to "NP1", and then "John and NP1" to "NP", which is the start symbol. (I am skipping obvious steps like reducing names to NP1s.) > If I understand this right, then, what I still don't understand is this. > As you say, at word 3, the parser has two choices, shift or reduce. > Now it seems to me that at this point it's not allowed to look at word > 4 [...] That's not really the problem in this case. The problem arises because elidable terminators are only inserted into strings that would otherwise be ungrammatical, and only if the error is detected at the right point. So the grammar of descriptions is Descr <- LE sumti KU Descr <- LE sumti ERROR where ERROR is a token introduced by the parser when it can neither shift nor reduce. But in le broda joi le ... "le" is shifted, so is "broda", and so is "joi". Then the next word must be a brivla (or equivalent construct), and "le" is an error. This does not happen with le broda .e le ... because as soon as ".e" is seen, an error results, and the parser recovers by inserting "ku" (followed by shift and reduce to "description"). But "joi" is legal within a description, unlike ".e", so the opportunity to insert an elidable terminator is irrevocably lost. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org Please leave your values | Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel | --Miles Vorkosigan