From lojban+bncCLr6ktCfBBCE6-flBBoEl7ysPQ@googlegroups.com Sat Oct 16 11:57:25 2010 Received: from mail-pz0-f61.google.com ([209.85.210.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1P7BwX-0003AS-0K; Sat, 16 Oct 2010 11:57:25 -0700 Received: by pzk2 with SMTP id 2sf1156220pzk.16 for ; Sat, 16 Oct 2010 11:57:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; bh=iBvvF70kBITovfqwQxW5HSiSDrhEvqaGbnPsCfWB6qw=; b=2x5/bKaCkPAkx2IUGRnNsfo4uWc6B4xunZ4SSy0+Tepe48bUEDL1pk4mCTA55vBr5w knkVpZJeACyiWhKWwbBzWtStEqcS/oD/t5Lj9S3wZ5IW4IwpDIw4uLrtAdtpKb1b9sm8 m6fSIq4mGB37GtIRuxr9Qwb+oO/Gy1fppTVtA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; b=aRIhpTcZVMh6c3vWQCjrqGG3FJV0ZGu1TsaT+s3wYrs7uB9I+NGYcUJE+yo6zdDUL6 G70mOElhLkP0xDYakSPb1b3rH8troRheJ13OIS8t7zSpAYyIREGw4wDqvflU037VMvd2 h9PCpC1Sg3IzpMpttBGD5cHT9d9IKYjxVtig0= Received: by 10.142.248.35 with SMTP id v35mr61700wfh.34.1287255428543; Sat, 16 Oct 2010 11:57:08 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.142.2.41 with SMTP id 41ls3073546wfb.0.p; Sat, 16 Oct 2010 11:57:07 -0700 (PDT) Received: by 10.142.52.17 with SMTP id z17mr1195740wfz.61.1287255427461; Sat, 16 Oct 2010 11:57:07 -0700 (PDT) Received: by 10.142.52.17 with SMTP id z17mr1195739wfz.61.1287255427433; Sat, 16 Oct 2010 11:57:07 -0700 (PDT) Received: from mail-pv0-f180.google.com (mail-pv0-f180.google.com [74.125.83.180]) by gmr-mx.google.com with ESMTP id n6si13645983wfl.7.2010.10.16.11.57.07; Sat, 16 Oct 2010 11:57:07 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.83.180 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=74.125.83.180; Received: by pvg6 with SMTP id 6so268185pvg.39 for ; Sat, 16 Oct 2010 11:57:07 -0700 (PDT) Received: by 10.142.253.8 with SMTP id a8mr1854360wfi.137.1287255425876; Sat, 16 Oct 2010 11:57:05 -0700 (PDT) Received: from sunflowerriver.org (c-68-35-167-179.hsd1.nm.comcast.net [68.35.167.179]) by mx.google.com with ESMTPS id q13sm18621005wfc.5.2010.10.16.11.57.03 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 16 Oct 2010 11:57:04 -0700 (PDT) Date: Sat, 16 Oct 2010 12:57:01 -0600 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] Re: Questions on isolating utterances before completely parsing Message-ID: <20101016185701.GB10877@alice.local> Mail-Followup-To: lojban@googlegroups.com References: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com> <20101014234221.GC2916@alice.local> Mime-Version: 1.0 In-Reply-To: X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 74.125.83.180 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=windows-1252 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Oct 16, 2010 at 09:46:00AM -0700, symuyn wrote: > In reply to Mr. Post, saving and using continuations is a very > interesting idea, but unfortunately, I don't see how it would be > practically usable when it comes to editing near the beginning=97or even > middle!=97of the document. Hypothetically, if you have a long document, > editing it even in the middle would take a long time to process for > each re-parse. >=20 > The two points that you give at the end to ameliorate continuations' > problems are interesting but very difficult, as far as I can tell. > Perhaps you can give some answers=97 >=20 > Providing feedback during parsing of text downstream of the editing is > impossible as far as I can tell=97every PEG library I know=97including th= e > ones that I've written=97is a sealed black box: once you plug something > in, you must wait until it finishes getting the result out. >=20 The PEG parser I'm using requires you to write a generator for token input, which is the first place I'd try putting continuations: http://gazette.call-cc.org/issues/5.html Meaning, I'd manage my continuations on the input side, rather than the output side, because you're right--stuff pops out fully formed. I suspect I'd have to hack at the parser some after this to make continuations work as I expected, but I already expected I'd be learning and fixing this particular parser anyway. If you can save to and seek back to the input position for a particular parse, save the state variables of the parser, and save the syntax tree, it really should be possible. While I have extensive experience with parsers, I have very little experience with PEG parsers, so I accept that something about these parsers may make that difficult. I wouldn't try to do something like this with a recursive descent parser, because it saves state on the stack. I might be motivated enough to convert a recursive descent parser into a continuation passing style parser, which would allow me to save the stack-based representation of the parse on the heap if I needed to create a continuation. > Comparing parse trees and stopping re-parsing when they're > sufficiently similar is risky, if there is no way to guarantee that > the syntax tree is exactly the same all the way to the end *without re- > parsing the whole thing anyway*. As far as I can tell, just because a > new parse tree starts to look similar to the original tree, the new > parse tree is not necessarily identical till the end. (Or is that > actually a property of the Lojban grammar? If it is, only then should > early stopping by comparison be used.) >=20 You're correct. Sufficiently similar is a heuristic that won't work for all cases. I was suggesting that this trade-off was better than the other suggested trade-offs for solving this problem. As you're the one writing this thing, you get to decide which trade-off you want to deal with. ;-) What I was thinking is the the parser would run as a separate thread, and the parse tree in the main thread would contain in it a marshall object at the current parse location. Occassionally this marshall object would receive more of the parse tree and a new marshall object, replacing the old marshall object with the new bit of parse tree and the place it was still working. I wasn't thinking of the "all-or-nothing" property of PEG parsers, as this idea does require the PEG parser to check back in with the caller from time to time. > However, continuations *are* indeed the only way that I can now think > of to implement practically such a text editor *without* requiring > LIhU, etc. to be at the end of multi-utterance texts. >=20 I don't personally elide LIhU, LUhU, TOI, &c myself. I know that Eclipse can parse python (and presumably other languages) in its editor. I also know that this parser is not identical to the way that python parsers itself, as my coworkers have complained to me that Eclipse flagged a perfectly valid python construct as invalid when in fact python did something quite useful with the same construct. This doesn't stop people from using Eclipse to write python, and serves as prior art on how people cope (and programs don't cope) with this sort of thing. In our case we add little tokens in the code that Eclipse recognizes=20 but python doesn't, so Eclipse will shut up and python will work. I am curious to hear how this project goes for you. What platform are you targeting? What language are you using? What PEG parser are you trying? No one has tried doing anything like this with Lojban before, which makes me quite excited. -Alan --=20 .i ko djuno fi le do sevzi --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.