From lojban+bncCKatz7nFHBDJreflBBoEB3841Q@googlegroups.com Sat Oct 16 09:46:18 2010
Received: from mail-pv0-f189.google.com ([74.125.83.189])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCKatz7nFHBDJreflBBoEB3841Q@googlegroups.com>)
	id 1P79tc-0002n9-Rw; Sat, 16 Oct 2010 09:46:18 -0700
Received: by pvc22 with SMTP id 22sf1102201pvc.16
        for <multiple recipients>; Sat, 16 Oct 2010 09:46:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:mime-version
         :received:received:date:in-reply-to:x-ip:references:user-agent
         :x-http-useragent:message-id:subject:from:to:x-original-sender
         :reply-to:precedence:mailing-list:list-id:list-post:list-help
         :list-archive:sender:list-subscribe:list-unsubscribe:content-type
         :content-transfer-encoding;
        bh=fozPKkd6O1YLF6NQDQqi3zow+Jp1ZMLqZVKJ/VRBRCI=;
        b=2DGStDzSt575M2m7hhHtiEt9DB2uN788ujOHFMWQOs8MdXbv8EUm0BnlKHvlWd/fbT
         pTQ3s0Qbdo283+IA7d6SYM9FXhMThfAjfRtvrsXIUpc+oB5euDoy4vgo8hgouiEKtQSI
         w9zbezufXrCysEmK+FSQdK/QVcjmnjQsLhaP8=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:mime-version:date:in-reply-to:x-ip:references
         :user-agent:x-http-useragent:message-id:subject:from:to
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :list-post:list-help:list-archive:sender:list-subscribe
         :list-unsubscribe:content-type:content-transfer-encoding;
        b=cGQuyHrb7EQ/Mdyh+iJdt0v8WCphTxgrqxk6AXV9h4h7OWbrcWYQYgGuV2oOgk6GNk
         QG4m7ZVjqp4J7fv5ZEHNHgETgZ5p54sd0qlcaRkZa+IyI3hkSxtUewI530uHSGbsck8B
         M438VoQl0k5ufKVRL0302+Torb14VdBbmTJaw=
Received: by 10.142.149.20 with SMTP id w20mr60543wfd.17.1287247561567;
        Sat, 16 Oct 2010 09:46:01 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.142.70.10 with SMTP id s10ls4217519wfa.1.p; Sat, 16 Oct 2010
 09:46:00 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.142.250.2 with SMTP id x2mr61585wfh.30.1287247560826; Sat, 16
 Oct 2010 09:46:00 -0700 (PDT)
Received: by e22g2000prj.googlegroups.com with HTTP; Sat, 16 Oct 2010 09:46:00
 -0700 (PDT)
Date: Sat, 16 Oct 2010 09:46:00 -0700 (PDT)
In-Reply-To: <20101014234221.GC2916@alice.local>
X-IP: 70.162.0.8
References: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com>
 <20101014234221.GC2916@alice.local>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us)
 AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5,gzip(gfe)
Message-ID: <d555294f-7d33-4c29-89e8-9fdf5f745b1f@e22g2000prj.googlegroups.com>
Subject: [lojban] Re: Questions on isolating utterances before completely parsing
From: symuyn <rbysamppi@gmail.com>
To: lojban <lojban@googlegroups.com>
X-Original-Sender: rbysamppi@gmail.com
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Oct 14, 4:42=C2=A0pm, ".alyn.post." <alyn.p...@lodockikumazvati.org>
wrote:
> On Thu, Oct 14, 2010 at 04:13:49PM -0700, symuyn wrote:
> > I've got a hypothetical problem. It's pretty long, but please bear
> > with me.
>
> > Let's say that, hypothetically, someone is creating a text editor for
> > Lojban, one which shows the syntactical structure of whatever you've
> > typed *while you type*. The text would be displayed somewhat like
> > this:
>
> > =C2=A0 =C2=A0=E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=
=B9klama bo klama=E2=80=BA=E2=80=BA=E2=80=BA
>
> > Let's also imagine, hypothetically, that this person has made the
> > editor pre-parse all whitespace/dot-separated chunks of text into the
> > valsi that the chunks correspond to, identifying their selma'o and all
> > that (e.g. "melo" =E2=86=92 [<"me" in ME> <"lo" in LE>]). This is befor=
e
> > checking the grammar of the text.
>
> > So this hypothetical text editor uses two parsers right now: a chunks-
> > of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures
> > parser.
>
> > Let's also say that, hypothetically, in testing this text editor, that
> > this person encountered a problem.
>
> > The hypothetical text editor becomes slower and slower when the text
> > grows in size. This is because, unfortunately, the entire text has to
> > be parsed whenever a new word is added or existing text is deleted.
>
> > What to do? The person hypothetically comes up with an idea! There
> > could be a *third* parser between the already existing two parsers,
> > one that converts sequences of valsi into unparsed utterances! The
> > third parser would ignore everything except I, NIhO, LU, LIhU, TO,
> > TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed
> > utterances.
>
> > For instance, the third parser would convert the sequence of valsi [i
> > cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu
> > [[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]].
>
> > Therefore, with this new parser, the hypothetical editor can keep
> > track of what the boundaries of the utterance *currently being edited*
> > is, and re-parse *only the current utterance* when it's edited.
>
> > But then, the person finds a problem with that solution! A fatal flaw:
> > *LIhU, TOI, and TUhE are elidable*.
>
> > Because of that, it seems that it's impossible to isolate an utterance
> > from the text it is in without parsing the whole text for complete
> > grammar.
>
> > That's the end of the hypothetical situation. My questions are as
> > following:
>
> > * Is it true that the fact that LIhU, TOI, and TUhE are elidable makes
> > isolating an utterance impossible without completely parsing the text
> > the utterance is in? (Just making sure.)
> > * Should the person make the third parser anyway while making LIhU,
> > TOI, and TUhE *required and non-elidable*?
> > * Is there another practical solution for the editor?
>
> > Remember, the problem is that the hypothetical text editor is getting
> > slow because otherwise it needs to parse the entire text for every
> > edit.
>
> My first thought, in reading this, is that what you really want is
> to have the state of the program (parse tree and token stream) saved
> at the place right before the place the person makes the edit.
>
> Then, after the edit is made, you restore the program to the state
> it was right before it parsed the text just edited, and continue
> parsing, picking up the newly edited text. =C2=A0This is called a
> continuation.[1]
>
> Of course, you don't know where the user is going to be editing, so
> you have to save multiple continuations during the parse, and restart
> whichever one is closest and earlier than the edit, and throw away
> the continuations that are now invalid because they occured later in
> the text.
>
> This doesn't have to be an excessive number of continuations. =C2=A0You o=
nly
> have to save as many continuations as you need to make your program fast
> enough. =C2=A0You should have some idea about how much text is "too much"=
 for
> performance--how much text makes your parser too slow. =C2=A0So you can
> create continuations every N bytes knowing you'll only have to parse M
> bytes (where M<=3DN) after any edit.
>
> This saves you the work of having three different parsers in
> exchange for whatever overhead the continuation has. =C2=A0That is going =
to
> depend on other factors in your design, but you'd essentially be
> trading memory for speed.
>
> Edits early in the document are of course going to need to reparse
> everything after the edit, but:
>
> =C2=A0* you'll parse the most recently changed text first, and you can
> =C2=A0 =C2=A0provide feedback on this before the parse fully finishes.
> =C2=A0* Once you get to the next N boundary in your text (accounting for
> =C2=A0 =C2=A0additions and deletions made, of course), you can compare th=
e results
> =C2=A0 =C2=A0from the current parse to the parse tree you got last time, =
and decide
> =C2=A0 =C2=A0to stop once they start being identical.
>
> I can't really imagine implementing this in any other way, but I'm
> curious to read everyone else's ideas.
>
> 1:http://en.wikipedia.org/wiki/Continuation
>
> -Alan
> --
> .i ko djuno fi le do sevzi

Thanks for the replies, everyone.

***

In reply to Mr. Post, saving and using continuations is a very
interesting idea, but unfortunately, I don't see how it would be
practically usable when it comes to editing near the beginning=E2=80=94or e=
ven
middle!=E2=80=94of the document. Hypothetically, if you have a long documen=
t,
editing it even in the middle would take a long time to process for
each re-parse.

The two points that you give at the end to ameliorate continuations'
problems are interesting but very difficult, as far as I can tell.
Perhaps you can give some answers=E2=80=94

Providing feedback during parsing of text downstream of the editing is
impossible as far as I can tell=E2=80=94every PEG library I know=E2=80=94in=
cluding the
ones that I've written=E2=80=94is a sealed black box: once you plug somethi=
ng
in, you must wait until it finishes getting the result out.

Comparing parse trees and stopping re-parsing when they're
sufficiently similar is risky, if there is no way to guarantee that
the syntax tree is exactly the same all the way to the end *without re-
parsing the whole thing anyway*. As far as I can tell, just because a
new parse tree starts to look similar to the original tree, the new
parse tree is not necessarily identical till the end. (Or is that
actually a property of the Lojban grammar? If it is, only then should
early stopping by comparison be used.)

However, continuations *are* indeed the only way that I can now think
of to implement practically such a text editor *without* requiring
LIhU, etc. to be at the end of multi-utterance texts.

***

In reply to Mr. Jones, I'd love to hear from xorxes, and other people,
if eliding LIhU, etc. is "looked down upon". That would definitely
shift my deciding toward prohibiting eliding LIhU, etc. But, if LIhU,
etc. are elided *a lot*, then the text editor ought to be able to
handle that.

The hypothetical text editor really is hypothetical. I have a
rudimentary Lojban parser written in a parser library that I wrote
(and need to finish, bleh), but I am currently still planning it.

***

In reply to Mr. Lopresto, my earlier second question doesn't dispute
that eliding LIhU, etc. is grammatically valid. The second question
asks if, hypothetically, it is *worth* the sacrifice to make them non-
elidable to be able to isolate utterances and re-parse only them
during editing.

You gave an interesting suggestion: "it seems completely fair
(hypothetically) to make a parser that exhibits sub-optimal
performance in those unusual cases (reparsing all of the above bridi,
instead of just the {mi prami do} part, for instance)." Unfortunately,
as far as I can tell it is not practically possible to be able to
parse with elidable LIhU, etc. and have sub-optimal performance in
*only* those unusual cases. The person making the hypothetical text
editor can't switch between a fast parser that can't elide them to a
slow parser that can, because in order to detect when LIhU, etc. has
been elided, you need to parse with the slow parser anyway!

***

I'm still struggling over if continuations vs. non-elidable text
terminators are worth it. Lojbanic robustness vs. performance
robustness.

Using continuations=E2=80=94the only apparent way=E2=80=94would make the te=
xt editor a
lot more complex. But, is it worth making a text editor *at all* if it
can't parse *all* Lojban, including Lojban that validly elides LIhU,
etc.?

I just don't know.

(There is a third option, of course: not implementing LU=E2=80=A6LIhU, TU=
=E2=80=A6
TUhE, etc. at all! But that's definitely unacceptable.)

--=20
You received this message because you are subscribed to the Google Groups "=
lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.
For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.