From nobody@digitalkingdom.org Wed Aug 16 23:41:07 2006 Received: with ECARTIS (v1.0.0; list lojban-list); Wed, 16 Aug 2006 23:41:08 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1GDbYd-0004Ad-JX for lojban-list-real@lojban.org; Wed, 16 Aug 2006 23:40:47 -0700 Received: from subvert-the-dominant-paradigm.net ([206.123.113.154]) by chain.digitalkingdom.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.62) (envelope-from ) id 1GDbYb-0004AW-P3 for lojban-list@lojban.org; Wed, 16 Aug 2006 23:40:47 -0700 Received: from [59.144.69.200] (helo=[192.168.0.253]) by subvert-the-dominant-paradigm.net with esmtpsa (TLS-1.0:RSA_ARCFOUR_MD5:16) (Exim 4.60) (envelope-from ) id 1GDbYJ-00054G-1A for lojban-list@lojban.org; Thu, 17 Aug 2006 06:40:28 +0000 Subject: [lojban] Re: parsing with error detection and recovery From: John Leuner To: lojban-list@lojban.org In-Reply-To: <737b61f30608160533g388659c4v7b8020357f7664c@mail.gmail.com> References: <737b61f30608151434h6ed71ec2k123f043c1ad59838@mail.gmail.com> <1155767873.6227.13.camel@localhost.localdomain> <737b61f30608160533g388659c4v7b8020357f7664c@mail.gmail.com> Content-Type: text/plain Date: Fri, 18 Aug 2006 00:39:02 +0530 Message-Id: <1155841743.6227.25.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.4.2.1 Content-Transfer-Encoding: 7bit X-Spam-Score: 0.2 (/) X-archive-position: 12479 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: jewel@subvert-the-dominant-paradigm.net Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list > Rules failing would never be an error. There would be a group of rules > that are flagged in the grammar (interpreted by the user of the parse > tree, not actually affecting the parser) as being error conditions > when they *suceed*. Actively matching different ways to mess up the > input. This would require a lot of rules. So if these error rules don't succeed, they will never consume any input? > Well, there are a couple ways in which PEG parsers themselves can > produce error messages, but the parsers are extremely simple, so > there's really not much room there. I think most of the work has to be > done to the grammar itself. On the other hand, I *am* planning on > doing a lot of experimenting with simple PEG grammars to prove the > general concept, and to teach myself how PEG grammars are written, > before I do anything with Lojban. Do you think simple rules will be enough to control the error detection/recovery process? Maybe it will be necessary to write some logic in a programming language too. > > My parser just tells you that the parse failed and the > > point at which it failed. > > What are your criteria for this? Just that the input stream wasn't > fully eaten? I think that ideally, you'd be able to be confident > enough in your grammar that you could pick a starting rule that isn't > supposed to parse to the end of the stream, and use it to repeatedly > apply to an input stream. And that any errors would result in error > rules in the grammar rather than an incompletely parsed input. This > could be a great efficiency optimization (for memory usage) for > suitable input, becuase the memoization cache can be cleared after > each substructure is parsed. If the top-level rule ("text") parses successfully, the parser will return the start-index and end-index of the matched text. There is also a convenience method to check whether all the input was matched or not. I don't see how you could just repeatedly apply a rule, you would have to skip over stuff that was causing it to fail. Finding out what "is causing it to fail" is very difficult. > For Lojban, this basically amounts to calling the parser with the > 'bridi' rule instead of the 'text' rule. I'm not sure what the implications are of using a rule like "sentence" instead of "text". John To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.