From nobody@digitalkingdom.org Sun Jun 15 19:39:26 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Sun, 15 Jun 2008 19:39:27 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1K84cv-0006AT-VE for lojban-list-real@lojban.org; Sun, 15 Jun 2008 19:39:26 -0700 Received: from wf-out-1314.google.com ([209.85.200.168]) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1K84cm-00068Q-7j for lojban-list@lojban.org; Sun, 15 Jun 2008 19:39:25 -0700 Received: by wf-out-1314.google.com with SMTP id 23so5141683wfg.25 for ; Sun, 15 Jun 2008 19:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=IZlQvLftt7YvZizOWmHIgOLjYuFiyRm7lmIcyRAzuYw=; b=GM33Wd7Aw/yDdPI7iwsPOYRdmm4SlCBmstfbV7uczhzoLZl32EZuOduZgidhPDIFvP bj9SS7s9wP3OEI/0PbEmIM88ukCcD0iU6JjFlbWs9O+CONqCc05gV/xQ8UVHqxkK7Tki CW8/jO73UT6f4+6fvY+M5IHQFp3flGNIwoEXI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=t/qWvWEjYTFHBf+zvQPFqjvVBMPJmNxGnsTlE1VfDya+eW847NnXUTGzs/Hemfar7N y7BwTThO7xaafKgj/hYEmx3sTIWaQS83Ibz6UfVIFqKjniupANBEQZiMTGLkpLp3OZUk Cn18fj9Cdmy0tPt1ldJvmq8Bil3ZfX8CAGa0A= Received: by 10.142.210.4 with SMTP id i4mr2117403wfg.240.1213583949457; Sun, 15 Jun 2008 19:39:09 -0700 (PDT) Received: by 10.142.50.21 with HTTP; Sun, 15 Jun 2008 19:39:09 -0700 (PDT) Message-ID: <737b61f30806151939g53bbd8a1s3480b51573d433a1@mail.gmail.com> Date: Sun, 15 Jun 2008 21:39:09 -0500 From: "Chris Capel" To: lojban-list@lojban.org Subject: [lojban] PEG grammar issues MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Spam-Score: -0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14501 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: pdf23ds@gmail.com Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list I've found a few minor issues with the PEG grammar over the course of working on it, and I thought I'd start to tell people about them in case I forget them. First, the top-level production should fail if it can't parse the whole string. Currently 'text' ends with an EOF?, which makes it never fail. However, it can't be EOF, because LU references the text production, and doesn't require an EOF before "li'u". 'text' should be changed to something like text-eof <- text EOF text <- intro-null NAI-clause* text-part-2 (!text-1 joik-jek)? text-1? faho-clause I made this change in my parser in the first release, and it seems to work fine. Second, selbri-3 should parse its child selbri-4 into left-associative groups. Currently it just parses them all into one group, which is misleading and possibly wrong, depending on your interpretation. I tried to figure out a way to fix this, but couldn't find a way to do so and avoid left recursion in the definition. So I gave up and added a post-parsing step in my own parser to group them properly. Third, tenses that probably ought to be parsed as part of the bridi are currently being parsed as head terms, because of the term-1 production: term-1 <- sumti / ( !gek (tag / FA-clause free*) (sumti / KU-clause? free*) ) / termset / NA-clause KU-clause free* {mi} {pu} (In braces are term-1 matches, and in angle brackets is the bridi-tail.) 'term-1' matches "mi", and then it matches "pu". 'sumti' fails on "pu", but '!gek' and 'tag' succeed, and then since 'KU-clause' and 'free' are both optional, the second option of 'term-1' succeeds. I'm not exactly sure how this one needs to be fixed, but what about this: term-1 <- sumti / term-2 / termset / NA-clause KU-clause free* term-2 <- !gek (tag (sumti / KU-clause free*) / FA-clause free* (sumti / KU-clause? free*) ) Here, 'term-2' is the second option of the original 'term-1', except that the third item in the sequence has been factored into the second, and the ? removed from 'KU-clause' after 'tag'. It seems to work in my parser! Fourth, 'term-sa' only appears to match one term sa under some conditions. For instance, it doesn't match this: mi ba klama lo sa lo sa do which one might imagine could be said by someone with a stutter. Here's one possible fix: term-sa <- term-start (!term-1 (sa-word / SA-clause !term-1) )* SA-clause &term-1 Well, that's about it for now. Number 3 is currently a problem for my parser, as it makes it hard for me to correctly gloss tense cmavo in context. So if anyone sees a problem with my correction, let me know. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.