From lojban-out@lojban.org Thu Aug 17 16:12:16 2006 Return-Path: X-Sender: lojban-out@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (qmail 45706 invoked from network); 17 Aug 2006 12:52:41 -0000 Received: from unknown (66.218.66.167) by m23.grp.scd.yahoo.com with QMQP; 17 Aug 2006 12:52:41 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (64.81.49.134) by mta6.grp.scd.yahoo.com with SMTP; 17 Aug 2006 12:52:40 -0000 Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1GDhMJ-0006wY-Aq for lojban@yahoogroups.com; Thu, 17 Aug 2006 05:52:28 -0700 Received: from chain.digitalkingdom.org ([64.81.49.134]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1GDhKK-0006u0-SR; Thu, 17 Aug 2006 05:50:34 -0700 Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 17 Aug 2006 05:50:11 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1GDhJJ-0006ra-UQ for lojban-list-real@lojban.org; Thu, 17 Aug 2006 05:49:23 -0700 Received: from nz-out-0102.google.com ([64.233.162.198]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1GDhJC-0006rL-JM for lojban-list@lojban.org; Thu, 17 Aug 2006 05:49:19 -0700 Received: by nz-out-0102.google.com with SMTP id n1so317090nzf for ; Thu, 17 Aug 2006 05:49:13 -0700 (PDT) Received: by 10.35.111.14 with SMTP id o14mr3483826pym; Thu, 17 Aug 2006 05:49:13 -0700 (PDT) Received: by 10.35.22.14 with HTTP; Thu, 17 Aug 2006 05:49:12 -0700 (PDT) Message-ID: <925d17560608170549j7e5e994dydf1e11d877b815aa@mail.gmail.com> Date: Thu, 17 Aug 2006 09:49:12 -0300 In-Reply-To: <737b61f30608151434h6ed71ec2k123f043c1ad59838@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <737b61f30608151434h6ed71ec2k123f043c1ad59838@mail.gmail.com> X-Spam-Score: -2.4 (--) X-archive-position: 12482 X-ecartis-version: Ecartis v1.0.0 Errors-to: lojban-list-bounce@lojban.org X-original-sender: jjllambias@gmail.com X-list: lojban-list X-Spam-Score: -2.4 (--) To: lojban@yahoogroups.com X-Originating-IP: 64.81.49.134 X-eGroups-Msg-Info: 1:0:0:0 X-eGroups-From: "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" From: "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" Reply-To: jjllambias@gmail.com Subject: [lojban] Re: parsing with error detection and recovery X-Yahoo-Group-Post: member; u=116389790; y=HBLTmXQxMnTAylvfYIMRKH27HXtr8iy8AcJbytCwFkfJWovwcQ X-Yahoo-Profile: lojban_out X-Yahoo-Message-Num: 26919 On 8/15/06, Chris Capel wrote: > > For instance, the morphology rules in the BPFK Peg Morphology[1] will > only parse consonants that don't appear in invalid consonant clusters. > If a consonant cluster is invalid, it will stop parsing. But by adding > error rules for consonants that don't check the validity (that only > get matched if the ones that do check don't match) or that check for > specific kinds of invalid pairs, the output of the parser could be > more likely to finish, That part seems relatively easy to do: Define a new top rule: tolerant-text <- text / text-without-phonotactic-constraints Make a copy of the full grammar with each rule name tagged with "-without-phonotactic-constraints". Eliminate the phonotactic constraints from the second set of rules. These appear only in a few rules. for example, instead of: c <- comma* [cC] !h !c !s !x !voiced you will have: c-without-phonotactic-constraints <- comma* [cC] > and could tell the user why the cluster is > invalid. That may be harder to achieve. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.