From lojban+bncCLr6ktCfBBCU67TnBBoErhwiDQ@googlegroups.com Wed Nov 24 07:57:39 2010 Received: from mail-gx0-f189.google.com ([209.85.161.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PLHix-0003si-Dp; Wed, 24 Nov 2010 07:57:39 -0800 Received: by gxk19 with SMTP id 19sf2008787gxk.16 for ; Wed, 24 Nov 2010 07:57:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=6yNtmyZC4MUbdVJoPZGl+PLmXGiQE8U5oPN2qtnNJmU=; b=xIyTxkUw6xlmQu55wAwK3l1E/sVgc4s3y+7yOXAbJTfDJwQXuJ0H/+NkI5i8PmiE/z S+QCrmrogtT8rnTSYsp9IzBvHt0trszxDXTRv7qLKoVjC3fXKLPOvctxA5CCHx+ermvD VciCZFKJL3rPGmNE1v+xvnocxqgmzVEs6mJr8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=sy/lDbYQrLE0DYcPVx7Xx/HuRVWtNV3zW5sRkfpVEBd5L1xEqDwmTq82Sh6TS/9qls ZwSslOx1dZuuJ36cu6EBoNCYZlTFup+xorSlkYHrAxYGoxRHCW/6OMOUuyIlc/C9oATH bgAJ13OlNk30T3VXmHa1ByUGOX7oq3PCOC2CU= Received: by 10.150.69.28 with SMTP id r28mr82832yba.63.1290614164776; Wed, 24 Nov 2010 07:56:04 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.151.135.18 with SMTP id m18ls1053204ybn.1.p; Wed, 24 Nov 2010 07:56:03 -0800 (PST) Received: by 10.151.153.18 with SMTP id f18mr167246ybo.35.1290614163712; Wed, 24 Nov 2010 07:56:03 -0800 (PST) Received: by 10.151.153.18 with SMTP id f18mr167244ybo.35.1290614163686; Wed, 24 Nov 2010 07:56:03 -0800 (PST) Received: from mail-gw0-f46.google.com (mail-gw0-f46.google.com [74.125.83.46]) by gmr-mx.google.com with ESMTP id i38si309118yhj.1.2010.11.24.07.56.03; Wed, 24 Nov 2010 07:56:03 -0800 (PST) Received-SPF: neutral (google.com: 74.125.83.46 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=74.125.83.46; Received: by mail-gw0-f46.google.com with SMTP id 20so866723gwj.19 for ; Wed, 24 Nov 2010 07:56:03 -0800 (PST) Received: by 10.150.158.4 with SMTP id g4mr1261165ybe.38.1290614163152; Wed, 24 Nov 2010 07:56:03 -0800 (PST) Received: from sunflowerriver.org (173-10-243-253-Albuquerque.hfc.comcastbusiness.net [173.10.243.253]) by mx.google.com with ESMTPS id v18sm4880336yhg.15.2010.11.24.07.56.00 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 24 Nov 2010 07:56:01 -0800 (PST) Date: Wed, 24 Nov 2010 08:55:58 -0700 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] NORATS, SPACE, and PUBLIC in PEG grammar Message-ID: <20101124155558.GC12462@alice.local> Mail-Followup-To: lojban@googlegroups.com References: <20101123180616.GB10838@alice.local> <20101123181027.GQ9301@digitalkingdom.org> <20101123181658.GR9301@digitalkingdom.org> <20101123183210.GD10838@alice.local> <20101123184601.GS9301@digitalkingdom.org> <20101123185735.GF10838@alice.local> <20101123190215.GW9301@digitalkingdom.org> <20101123192523.GH10838@alice.local> <20101124081733.GF9301@digitalkingdom.org> Mime-Version: 1.0 In-Reply-To: <20101124081733.GF9301@digitalkingdom.org> X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 74.125.83.46 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline On Wed, Nov 24, 2010 at 12:17:33AM -0800, Robin Lee Powell wrote: > On Tue, Nov 23, 2010 at 12:25:23PM -0700, .alyn.post. wrote: > > The bootstrap compiler is compiling the morphology and morphology > > header file, but I'm still working on the peg grammar itself. > > Damn. That's a lot of work; good luck! > Thank you! I'm compiling the grammar file now, minus the morphology inteface section. It seems from here that writing the morphology interface section will be more work than getting a PEG parser bootstrapped, ha! I'm also down to string-encoding issues in the comparison between my hand-written bootstrap and the PEG parser it is compiling, which are a class of problem that might be in either file, so I saved them for the end. > > Given that Lojban is used as an example of a complex PEG grammar: > > > > http://en.wikipedia.org/wiki/Parsing_expression_grammar#External_links > > Lojban is almost certainly the most complex fully regular (except > ZOi) grammar in actual use in the world. The only time you might > get something worse is regularized versions of natlang grammars. > Lojban's grammar is something like 10x the size of most programming > languages. > > > I'm not sure it's a bad idea to have a peg parser generator > > written specifically to parse Lojban. > > It's certainly a great test-to-destruction choice. :) Throw the > entirety of {la .alis.} at it in one pass, for example. :) > I'm actually using that as my litmus test for success. My goal, starting the project, was to be able to parse all of {la .alis.} in one go, even if it requires so much memory I have to use the 128GB RAM Linux box here at my office to do it. I've been mindful about memory usage and performance in writing the parser, as this project isn't an academic exercise for me. I think we should be able to parse book-sized inputs. That or get rid of ZOI. ;-p I've got smaller milestones to pass first, of course, like the test sentences you've got for camxes and smaller works like my own {lo do ckiku ma zvati}. But {la .alis.} is certainly the big prize. :-D > > I do wish there had been something available already, but I'm not > > aware of Scheme code that parsers PEG files--they all seem to want > > to write the grammar definition in Scheme itself. > > Well, you could always write a pre-processor to output Scheme from a > common PEG format. > > Honestly, whatever we end up with in terms of the PEG grammar we > declare as the formalized This ... Is ... Lojban!!! (assuming we do > so), it's going to be "wrong" in the sense that you'll have to > process it to get a working input file for whatever parser generator > you're *actually* using. I don't really see any way to avoid that, > although the NORATS and so on were intended to encode some > meta-parser sorts of information about certain productions. > Do you think it is better for the LLG to publish a PEG file that requires work to use at all, or to publish a reference inmplementation that introduces more dependencies than a PEG specification but is closer to something "working." (Or, as always, secret option #3, ignoring my false dichotomy and giving an answer unconstrained by the phrasing of my question.) Do you think your opinion differs from what the LLG would decide? Thank you for spending time answering my questions, it has really accelerated my progress in writing this parser. -Alan -- .i ko djuno fi le do sevzi -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.