From nobody@digitalkingdom.org Tue Aug 15 20:36:43 2006
Received: with ECARTIS (v1.0.0; list lojban-list); Tue, 15 Aug 2006 20:36:44 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62)	(envelope-from <nobody@digitalkingdom.org>)	id 1GDCCf-0008Kl-7d	for lojban-list-real@lojban.org; Tue, 15 Aug 2006 20:36:25 -0700
Received: from nf-out-0910.google.com ([64.233.182.184])	by chain.digitalkingdom.org with esmtp (Exim 4.62)	(envelope-from <matt.mattarn@gmail.com>)	id 1GDCCd-0008Ka-8l	for lojban-list@lojban.org; Tue, 15 Aug 2006 20:36:25 -0700
Received: by nf-out-0910.google.com with SMTP id x30so548739nfb        for <lojban-list@lojban.org>; Tue, 15 Aug 2006 20:36:21 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;        s=beta; d=gmail.com;        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;        b=AloKJ9PjJwIUB/EQxA5UUxyK9tyFd6ok0AxAATUz0QT25pAVxmO7mn5KBU1MPS9UxJGik5ama7JOg6s5+xcvciERbkZv78JqqUuZ4fi+proQ+ZNQqNJjHpYb8AIeCjcbOvxO7IiCq0VeshzYUgZyFrB6L+s7i7Hh3kQHd6xwtZU=
Received: by 10.49.75.2 with SMTP id c2mr169447nfl;        Tue, 15 Aug 2006 20:36:21 -0700 (PDT)
Received: by 10.78.161.17 with HTTP; Tue, 15 Aug 2006 20:36:21 -0700 (PDT)
Message-ID: <e6663d200608152036s172df93cgd057e73154aa3b4@mail.gmail.com>
Date: Tue, 15 Aug 2006 23:36:21 -0400
From: "Matt Arnold" <matt.mattarn@gmail.com>
To: lojban-list@lojban.org
Subject: [lojban] Re: parsing with error detection and recovery
In-Reply-To: <737b61f30608151434h6ed71ec2k123f043c1ad59838@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <737b61f30608151434h6ed71ec2k123f043c1ad59838@mail.gmail.com>
X-Spam-Score: -2.4 (--)
X-archive-position: 12464
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: matt.mattarn@gmail.com
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

On 8/15/06, Chris Capel <pdf23ds@gmail.com> wrote:
> I'm looking into implementing a friendly PEG parser. The current PEG
> grammar (and morphology) are very unfriendly, in that invalid lojban
> text is simply not parsable, as opposed to being parsable with
> possible errors listed. But a parser with error detection could be
> easily based on the existing PEG grammars by adding additional rules
> (with lower precedence than any rules for valid Lojban) that are
> specially marked and are associated with descriptive error messages.
> Adding these rules would also add substantial error recovery/tolerance
> to parsers.
>
> For instance, the morphology rules in the BPFK Peg Morphology[1] will
> only parse consonants that don't appear in invalid consonant clusters.
> If a consonant cluster is invalid, it will stop parsing. But by adding
> error rules for consonants that don't check the validity (that only
> get matched if the ones that do check don't match) or that check for
> specific kinds of invalid pairs, the output of the parser could be
> more likely to finish, and could tell the user why the cluster is
> invalid.
>
> Composing a good set of these rules would definitely be quite an art,
> but seems like a good approach.
>
> So, my question is this: is there an easy way to prove the equivalence
> of PEG parser A with the parts of parser B that apply only to valid
> input? My first hunch is that as long as B is derived from A by only
> adding rules where the added rule is an error condition if ever
> matched in an input, and by only modifying existing rules either by
> renaming them (and all references to them) or by adding options to the
> end that point toward error rules, then parser B will return a parse
> tree with no matches on error rules if and only if parser A would be
> able to parse the input at all. But I'm not completely sure that's the
> case.
>
> Chris Capel
>

Chris,
I applaud your efforts. If you can acheive this, it will be the only
parser I use!
-epkat


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.