From lojban+bncCLr6ktCfBBDkqt7lBBoErrCR_g@googlegroups.com Thu Oct 14 16:42:45 2010 Received: from mail-pv0-f189.google.com ([74.125.83.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1P6XRa-0000BB-H1; Thu, 14 Oct 2010 16:42:45 -0700 Received: by pvc22 with SMTP id 22sf77960pvc.16 for ; Thu, 14 Oct 2010 16:42:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; bh=BpgoVqfh6H2SA3GlQKPCy2cOdeCYlB6OplCY1UF5wng=; b=dDPB5nd6Tt04OmKg3TDzGJ4gL9p2AggfD+jNTVIHGqJUAgVJVk3Vi/+W/o+6N/qXPP QDLAccRxkv4Yx/GxWtXQRWe3zcBzF9Yc5WoxSTTySJJHsWpHkZ81OVpgVXCvpp+6cyHQ Cdba6XCFXS8k4wVKtDY65IQM3kPWLt6DDuN5k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; b=SewDn2Wjb7saVAxrQQ0OiQ58piSaGoSwzyq+C7F6C2s44nacrJPauXRX6VXVyxIZMA RzMbU0zm1zTBMOVg7NPzqKJQmBuwau2u+ZvoSEcxMQ5tA7DEqB37bOTWOa+wBv3vSJCU p5tzZ5pZJ8OsWNNR8bGl1nTyvvEyrUSQNMRRE= Received: by 10.142.118.15 with SMTP id q15mr558120wfc.18.1287099748280; Thu, 14 Oct 2010 16:42:28 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.142.70.10 with SMTP id s10ls1654717wfa.1.p; Thu, 14 Oct 2010 16:42:27 -0700 (PDT) Received: by 10.142.143.9 with SMTP id q9mr3738187wfd.25.1287099747311; Thu, 14 Oct 2010 16:42:27 -0700 (PDT) Received: by 10.142.143.9 with SMTP id q9mr3738184wfd.25.1287099747163; Thu, 14 Oct 2010 16:42:27 -0700 (PDT) Received: from mail-px0-f172.google.com (mail-px0-f172.google.com [209.85.212.172]) by gmr-mx.google.com with ESMTP id h6si15943420wfj.5.2010.10.14.16.42.27; Thu, 14 Oct 2010 16:42:27 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.212.172 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.212.172; Received: by pxi1 with SMTP id 1so26578pxi.31 for ; Thu, 14 Oct 2010 16:42:27 -0700 (PDT) Received: by 10.142.11.4 with SMTP id 4mr9458469wfk.174.1287099747030; Thu, 14 Oct 2010 16:42:27 -0700 (PDT) Received: from sunflowerriver.org (c-68-35-167-179.hsd1.nm.comcast.net [68.35.167.179]) by mx.google.com with ESMTPS id w42sm5816939wfh.15.2010.10.14.16.42.24 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 14 Oct 2010 16:42:25 -0700 (PDT) Date: Thu, 14 Oct 2010 17:42:21 -0600 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] Questions on isolating utterances before completely parsing Message-ID: <20101014234221.GC2916@alice.local> Mail-Followup-To: lojban@googlegroups.com References: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com> Mime-Version: 1.0 In-Reply-To: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com> X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 209.85.212.172 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 14, 2010 at 04:13:49PM -0700, symuyn wrote: > I've got a hypothetical problem. It's pretty long, but please bear > with me. >=20 > Let's say that, hypothetically, someone is creating a text editor for > Lojban, one which shows the syntactical structure of whatever you've > typed *while you type*. The text would be displayed somewhat like > this: >=20 > =E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=B9klama bo k= lama=E2=80=BA=E2=80=BA=E2=80=BA >=20 > Let's also imagine, hypothetically, that this person has made the > editor pre-parse all whitespace/dot-separated chunks of text into the > valsi that the chunks correspond to, identifying their selma'o and all > that (e.g. "melo" =E2=86=92 [<"me" in ME> <"lo" in LE>]). This is before > checking the grammar of the text. >=20 > So this hypothetical text editor uses two parsers right now: a chunks- > of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures > parser. >=20 > Let's also say that, hypothetically, in testing this text editor, that > this person encountered a problem. >=20 > The hypothetical text editor becomes slower and slower when the text > grows in size. This is because, unfortunately, the entire text has to > be parsed whenever a new word is added or existing text is deleted. >=20 > What to do? The person hypothetically comes up with an idea! There > could be a *third* parser between the already existing two parsers, > one that converts sequences of valsi into unparsed utterances! The > third parser would ignore everything except I, NIhO, LU, LIhU, TO, > TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed > utterances. >=20 > For instance, the third parser would convert the sequence of valsi [i > cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu > [[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]]. >=20 > Therefore, with this new parser, the hypothetical editor can keep > track of what the boundaries of the utterance *currently being edited* > is, and re-parse *only the current utterance* when it's edited. >=20 > But then, the person finds a problem with that solution! A fatal flaw: > *LIhU, TOI, and TUhE are elidable*. >=20 > Because of that, it seems that it's impossible to isolate an utterance > from the text it is in without parsing the whole text for complete > grammar. >=20 > That's the end of the hypothetical situation. My questions are as > following: >=20 > * Is it true that the fact that LIhU, TOI, and TUhE are elidable makes > isolating an utterance impossible without completely parsing the text > the utterance is in? (Just making sure.) > * Should the person make the third parser anyway while making LIhU, > TOI, and TUhE *required and non-elidable*? > * Is there another practical solution for the editor? >=20 > Remember, the problem is that the hypothetical text editor is getting > slow because otherwise it needs to parse the entire text for every > edit. >=20 My first thought, in reading this, is that what you really want is to have the state of the program (parse tree and token stream) saved at the place right before the place the person makes the edit. Then, after the edit is made, you restore the program to the state it was right before it parsed the text just edited, and continue parsing, picking up the newly edited text. This is called a continuation.[1] Of course, you don't know where the user is going to be editing, so you have to save multiple continuations during the parse, and restart whichever one is closest and earlier than the edit, and throw away the continuations that are now invalid because they occured later in the text. This doesn't have to be an excessive number of continuations. You only have to save as many continuations as you need to make your program fast enough. You should have some idea about how much text is "too much" for performance--how much text makes your parser too slow. So you can create continuations every N bytes knowing you'll only have to parse M bytes (where M<=3DN) after any edit. This saves you the work of having three different parsers in exchange for whatever overhead the continuation has. That is going to depend on other factors in your design, but you'd essentially be trading memory for speed. Edits early in the document are of course going to need to reparse everything after the edit, but: * you'll parse the most recently changed text first, and you can provide feedback on this before the parse fully finishes. * Once you get to the next N boundary in your text (accounting for additions and deletions made, of course), you can compare the results from the current parse to the parse tree you got last time, and decide to stop once they start being identical. I can't really imagine implementing this in any other way, but I'm curious to read everyone else's ideas.=20 1: http://en.wikipedia.org/wiki/Continuation -Alan --=20 .i ko djuno fi le do sevzi --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.