From lojban+bncCLr6ktCfBBDkqt7lBBoErrCR_g@googlegroups.com Thu Oct 14 16:42:45 2010
Received: from mail-pv0-f189.google.com ([74.125.83.189])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCLr6ktCfBBDkqt7lBBoErrCR_g@googlegroups.com>)
	id 1P6XRa-0000BB-H1; Thu, 14 Oct 2010 16:42:45 -0700
Received: by pvc22 with SMTP id 22sf77960pvc.16
        for <multiple recipients>; Thu, 14 Oct 2010 16:42:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:received:received
         :received:received-spf:received:received:received:date:from:to
         :subject:message-id:mail-followup-to:references:mime-version
         :in-reply-to:x-original-sender:x-original-authentication-results
         :reply-to:precedence:mailing-list:list-id:list-post:list-help
         :list-archive:sender:list-subscribe:list-unsubscribe:content-type
         :content-disposition:content-transfer-encoding;
        bh=BpgoVqfh6H2SA3GlQKPCy2cOdeCYlB6OplCY1UF5wng=;
        b=dDPB5nd6Tt04OmKg3TDzGJ4gL9p2AggfD+jNTVIHGqJUAgVJVk3Vi/+W/o+6N/qXPP
         QDLAccRxkv4Yx/GxWtXQRWe3zcBzF9Yc5WoxSTTySJJHsWpHkZ81OVpgVXCvpp+6cyHQ
         Cdba6XCFXS8k4wVKtDY65IQM3kPWLt6DDuN5k=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:received-spf:date:from:to:subject:message-id
         :mail-followup-to:references:mime-version:in-reply-to
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe:content-type
         :content-disposition:content-transfer-encoding;
        b=SewDn2Wjb7saVAxrQQ0OiQ58piSaGoSwzyq+C7F6C2s44nacrJPauXRX6VXVyxIZMA
         RzMbU0zm1zTBMOVg7NPzqKJQmBuwau2u+ZvoSEcxMQ5tA7DEqB37bOTWOa+wBv3vSJCU
         p5tzZ5pZJ8OsWNNR8bGl1nTyvvEyrUSQNMRRE=
Received: by 10.142.118.15 with SMTP id q15mr558120wfc.18.1287099748280;
        Thu, 14 Oct 2010 16:42:28 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.142.70.10 with SMTP id s10ls1654717wfa.1.p; Thu, 14 Oct 2010
 16:42:27 -0700 (PDT)
Received: by 10.142.143.9 with SMTP id q9mr3738187wfd.25.1287099747311;
        Thu, 14 Oct 2010 16:42:27 -0700 (PDT)
Received: by 10.142.143.9 with SMTP id q9mr3738184wfd.25.1287099747163;
        Thu, 14 Oct 2010 16:42:27 -0700 (PDT)
Received: from mail-px0-f172.google.com (mail-px0-f172.google.com [209.85.212.172])
        by gmr-mx.google.com with ESMTP id h6si15943420wfj.5.2010.10.14.16.42.27;
        Thu, 14 Oct 2010 16:42:27 -0700 (PDT)
Received-SPF: neutral (google.com: 209.85.212.172 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.212.172;
Received: by pxi1 with SMTP id 1so26578pxi.31
        for <lojban@googlegroups.com>; Thu, 14 Oct 2010 16:42:27 -0700 (PDT)
Received: by 10.142.11.4 with SMTP id 4mr9458469wfk.174.1287099747030;
        Thu, 14 Oct 2010 16:42:27 -0700 (PDT)
Received: from sunflowerriver.org (c-68-35-167-179.hsd1.nm.comcast.net [68.35.167.179])
        by mx.google.com with ESMTPS id w42sm5816939wfh.15.2010.10.14.16.42.24
        (version=TLSv1/SSLv3 cipher=RC4-MD5);
        Thu, 14 Oct 2010 16:42:25 -0700 (PDT)
Date: Thu, 14 Oct 2010 17:42:21 -0600
From: ".alyn.post." <alyn.post@lodockikumazvati.org>
To: lojban@googlegroups.com
Subject: Re: [lojban] Questions on isolating utterances before completely parsing
Message-ID: <20101014234221.GC2916@alice.local>
Mail-Followup-To: lojban@googlegroups.com
References: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com>
Mime-Version: 1.0
In-Reply-To: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com>
X-Original-Sender: alyn.post@lodockikumazvati.org
X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com:
 209.85.212.172 is neither permitted nor denied by best guess record for
 domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: text/plain; charset=UTF-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 14, 2010 at 04:13:49PM -0700, symuyn wrote:
> I've got a hypothetical problem. It's pretty long, but please bear
> with me.
>=20
> Let's say that, hypothetically, someone is creating a text editor for
> Lojban, one which shows the syntactical structure of whatever you've
> typed *while you type*. The text would be displayed somewhat like
> this:
>=20
>    =E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=B9klama bo k=
lama=E2=80=BA=E2=80=BA=E2=80=BA
>=20
> Let's also imagine, hypothetically, that this person has made the
> editor pre-parse all whitespace/dot-separated chunks of text into the
> valsi that the chunks correspond to, identifying their selma'o and all
> that (e.g. "melo" =E2=86=92 [<"me" in ME> <"lo" in LE>]). This is before
> checking the grammar of the text.
>=20
> So this hypothetical text editor uses two parsers right now: a chunks-
> of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures
> parser.
>=20
> Let's also say that, hypothetically, in testing this text editor, that
> this person encountered a problem.
>=20
> The hypothetical text editor becomes slower and slower when the text
> grows in size. This is because, unfortunately, the entire text has to
> be parsed whenever a new word is added or existing text is deleted.
>=20
> What to do? The person hypothetically comes up with an idea! There
> could be a *third* parser between the already existing two parsers,
> one that converts sequences of valsi into unparsed utterances! The
> third parser would ignore everything except I, NIhO, LU, LIhU, TO,
> TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed
> utterances.
>=20
> For instance, the third parser would convert the sequence of valsi [i
> cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu
> [[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]].
>=20
> Therefore, with this new parser, the hypothetical editor can keep
> track of what the boundaries of the utterance *currently being edited*
> is, and re-parse *only the current utterance* when it's edited.
>=20
> But then, the person finds a problem with that solution! A fatal flaw:
> *LIhU, TOI, and TUhE are elidable*.
>=20
> Because of that, it seems that it's impossible to isolate an utterance
> from the text it is in without parsing the whole text for complete
> grammar.
>=20
> That's the end of the hypothetical situation. My questions are as
> following:
>=20
> * Is it true that the fact that LIhU, TOI, and TUhE are elidable makes
> isolating an utterance impossible without completely parsing the text
> the utterance is in? (Just making sure.)
> * Should the person make the third parser anyway while making LIhU,
> TOI, and TUhE *required and non-elidable*?
> * Is there another practical solution for the editor?
>=20
> Remember, the problem is that the hypothetical text editor is getting
> slow because otherwise it needs to parse the entire text for every
> edit.
>=20

My first thought, in reading this, is that what you really want is
to have the state of the program (parse tree and token stream) saved
at the place right before the place the person makes the edit.

Then, after the edit is made, you restore the program to the state
it was right before it parsed the text just edited, and continue
parsing, picking up the newly edited text.  This is called a
continuation.[1]

Of course, you don't know where the user is going to be editing, so
you have to save multiple continuations during the parse, and restart
whichever one is closest and earlier than the edit, and throw away
the continuations that are now invalid because they occured later in
the text.

This doesn't have to be an excessive number of continuations.  You only
have to save as many continuations as you need to make your program fast
enough.  You should have some idea about how much text is "too much" for
performance--how much text makes your parser too slow.  So you can
create continuations every N bytes knowing you'll only have to parse M
bytes (where M<=3DN) after any edit.

This saves you the work of having three different parsers in
exchange for whatever overhead the continuation has.  That is going to
depend on other factors in your design, but you'd essentially be
trading memory for speed.

Edits early in the document are of course going to need to reparse
everything after the edit, but:

 * you'll parse the most recently changed text first, and you can
   provide feedback on this before the parse fully finishes.
 * Once you get to the next N boundary in your text (accounting for
   additions and deletions made, of course), you can compare the results
   from the current parse to the parse tree you got last time, and decide
   to stop once they start being identical.

I can't really imagine implementing this in any other way, but I'm
curious to read everyone else's ideas.=20

1: http://en.wikipedia.org/wiki/Continuation

-Alan
--=20
.i ko djuno fi le do sevzi

--=20
You received this message because you are subscribed to the Google Groups "=
lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.
For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.