Received: from mail-ob0-f189.google.com ([209.85.214.189]:37441) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1ShkEP-00005N-TJ; Thu, 21 Jun 2012 09:27:48 -0700 Received: by obbun3 with SMTP id un3sf872853obb.16 for ; Thu, 21 Jun 2012 09:27:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-disposition; bh=ASrN6D+fqdlNnNMgOFH0BafUDBWx/a9nvRWtOMgVunc=; b=GzREDoxYNMFCeTTrleLbqjJS3MyMM6KclTdOvT9vxZZrOy+Oc7qcE/j8RrSBj0xr4r bht6ny1Q6Lp5xrS+yciyRgVg5l0Hi6XFVaJq3IsccVj+tdMnZ0w8pTwt6EPs2BmueJaA TBVeB9QzHjS9jYVfIjGtXuxMgJX8BS0S70bhY= Received: by 10.68.134.71 with SMTP id pi7mr60749pbb.19.1340296055704; Thu, 21 Jun 2012 09:27:35 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.68.240.165 with SMTP id wb5ls2817591pbc.1.gmail; Thu, 21 Jun 2012 09:27:34 -0700 (PDT) Received: by 10.68.190.104 with SMTP id gp8mr149517pbc.4.1340296054059; Thu, 21 Jun 2012 09:27:34 -0700 (PDT) Received: by 10.68.190.104 with SMTP id gp8mr149514pbc.4.1340296054022; Thu, 21 Jun 2012 09:27:34 -0700 (PDT) Received: from stodi.digitalkingdom.org (mail.digitalkingdom.org. [173.13.139.236]) by gmr-mx.google.com with ESMTPS id iq5si382777pbc.1.2012.06.21.09.27.33 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 21 Jun 2012 09:27:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of rlpowell@digitalkingdom.org designates 173.13.139.236 as permitted sender) client-ip=173.13.139.236; Received: from rlpowell by stodi.digitalkingdom.org with local (Exim 4.76) (envelope-from ) id 1ShkEH-00005K-16 for lojban@googlegroups.com; Thu, 21 Jun 2012 09:27:33 -0700 Date: Thu, 21 Jun 2012 09:27:32 -0700 From: Robin Lee Powell To: lojban@googlegroups.com Subject: Re: [lojban] Testing Lua/LPeg version of the Lojban PEG Message-ID: <20120621162732.GY1445@stodi.digitalkingdom.org> Mail-Followup-To: lojban@googlegroups.com References: MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Original-Sender: rlpowell@digitalkingdom.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of rlpowell@digitalkingdom.org designates 173.13.139.236 as permitted sender) smtp.mail=rlpowell@digitalkingdom.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline X-Spam-Score: -0.0 (/) X-Spam_score: -0.0 X-Spam_score_int: 0 X-Spam_bar: / On Thu, Jun 21, 2012 at 11:08:21AM +0300, Veijo Vilva wrote: > I,ve now got a preliminary version of the full parser running. > > Basically the parser is just the re-formatted PEG, the rest is > about 150 lines of quite ordinary Lua code for glue between the > stages, some very small help functions and pretty printing. The > source files (The driver program, the morphology PEG and the > grammar PEG) presently total 2560 lines (incl. the comments and > the empty separator lines), about 78 kbytes. There is no binary > for the parser, which is compiled for each run. The compilation > time is about 100 ms on any decent PC, which has helped a lot > during the testing and refinement stage. Nice! Go you. > I've omitted the erasure handling rules as they seemed to cause > too much slowdown and rewritten some rules to speed up the > parsing process. > > I've also added rules to bracket sumti tcita and > zei lujvo. I had to add rules to the morphology PEG in order to > keep any quoted non-Lojban text intact - now the quoted text is > sent as a single non-L word to the grammar PEG. For all your changes that you believe do not change the language, can you comment on them at http://www.lojban.org/tiki/BPFK+Section%3A+Formal+Grammar ? > My present, very simple pretty printer is quite flexible. It can > produce either the full parse tree, which is probably required > only for checking the parser, or omit the numbered sub-rules > (sumti-1,...) or omit any user-defined set of intermediate levels > from the tree. It would be trivial to add glosses for cmavo and > gismu to the output. I've also given some thought to passing the > lujvo split from the morphology PEG. FWIW, the trick that I've used for programmatic tree pruning, that works very well, is to prune anything that has only one child. That, IIRC, is the entire difference between camxes and camxes -v. > I'll have to do some more testing before releasing the program for > general consumption. You might as well throw it up on github for people to play with, no? -Robin -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.