From lojban-out@lojban.org Wed Aug 16 05:39:49 2006 Return-Path: X-Sender: lojban-out@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (qmail 25878 invoked from network); 16 Aug 2006 12:37:48 -0000 Received: from unknown (66.218.67.36) by m34.grp.scd.yahoo.com with QMQP; 16 Aug 2006 12:37:48 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (64.81.49.134) by mta10.grp.scd.yahoo.com with SMTP; 16 Aug 2006 12:37:48 -0000 Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1GDKeO-0008Ic-NU for lojban@yahoogroups.com; Wed, 16 Aug 2006 05:37:40 -0700 Received: from chain.digitalkingdom.org ([64.81.49.134]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1GDKbr-0008F6-2F; Wed, 16 Aug 2006 05:35:42 -0700 Received: with ECARTIS (v1.0.0; list lojban-list); Wed, 16 Aug 2006 05:34:45 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1GDKag-0008ET-EH for lojban-list-real@lojban.org; Wed, 16 Aug 2006 05:33:47 -0700 Received: from nf-out-0910.google.com ([64.233.182.191]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1GDKaS-0008E1-Bg for lojban-list@lojban.org; Wed, 16 Aug 2006 05:33:41 -0700 Received: by nf-out-0910.google.com with SMTP id x30so687758nfb for ; Wed, 16 Aug 2006 05:33:31 -0700 (PDT) Received: by 10.49.19.18 with SMTP id w18mr648390nfi; Wed, 16 Aug 2006 05:33:31 -0700 (PDT) Received: by 10.49.92.8 with HTTP; Wed, 16 Aug 2006 05:33:31 -0700 (PDT) Message-ID: <737b61f30608160533g388659c4v7b8020357f7664c@mail.gmail.com> Date: Wed, 16 Aug 2006 07:33:31 -0500 In-Reply-To: <1155767873.6227.13.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <737b61f30608151434h6ed71ec2k123f043c1ad59838@mail.gmail.com> <1155767873.6227.13.camel@localhost.localdomain> X-Spam-Score: -2.4 (--) X-archive-position: 12468 X-ecartis-version: Ecartis v1.0.0 Errors-to: lojban-list-bounce@lojban.org X-original-sender: pdf23ds@gmail.com X-list: lojban-list X-Spam-Score: -2.4 (--) To: lojban@yahoogroups.com X-Originating-IP: 64.81.49.134 X-eGroups-Msg-Info: 1:0:0:0 X-eGroups-From: "Chris Capel" From: "Chris Capel" Reply-To: pdf23ds@gmail.com Subject: [lojban] Re: parsing with error detection and recovery X-Yahoo-Group-Post: member; u=116389790; y=6iUn0fShRlZE9Nhryj-Hb2DfzVI0Ufe5_R-vptavIiLm-WhmoQ X-Yahoo-Profile: lojban_out X-Yahoo-Message-Num: 26897 On 8/16/06, John Leuner wrote: [rearranged a bit] > I'm confused about how > the parser would distinguish between rules failing "naturally" (in a > successful parse there will be many points at which various rules > failed) and those failures which would cause a larger unit to fail, eg a > "sentence" or a "selbri". Rules failing would never be an error. There would be a group of rules that are flagged in the grammar (interpreted by the user of the parse tree, not actually affecting the parser) as being error conditions when they *suceed*. Actively matching different ways to mess up the input. This would require a lot of rules. > Error recovery seems to be quite a hard problem. I think it's quite doable, but a tricky and lengthy process. My hunch is that a mature set of the error rules will triple or quadruple the size of the Lojban grammar. > As a first step you could try modifying an existing PEG parser to > produce simple error messages. As far as I know none of the existing > ones do this yet. Well, there are a couple ways in which PEG parsers themselves can produce error messages, but the parsers are extremely simple, so there's really not much room there. I think most of the work has to be done to the grammar itself. On the other hand, I *am* planning on doing a lot of experimenting with simple PEG grammars to prove the general concept, and to teach myself how PEG grammars are written, before I do anything with Lojban. > My parser just tells you that the parse failed and the > point at which it failed. What are your criteria for this? Just that the input stream wasn't fully eaten? I think that ideally, you'd be able to be confident enough in your grammar that you could pick a starting rule that isn't supposed to parse to the end of the stream, and use it to repeatedly apply to an input stream. And that any errors would result in error rules in the grammar rather than an incompletely parsed input. This could be a great efficiency optimization (for memory usage) for suitable input, becuase the memoization cache can be cleared after each substructure is parsed. For Lojban, this basically amounts to calling the parser with the 'bridi' rule instead of the 'text' rule. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.