From lojban+bncCMHEmaCOBhCys97lBBoEOMWN2Q@googlegroups.com Thu Oct 14 17:01:09 2010
Received: from mail-gx0-f189.google.com ([209.85.161.189])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCMHEmaCOBhCys97lBBoEOMWN2Q@googlegroups.com>)
	id 1P6XjN-00012h-C5; Thu, 14 Oct 2010 17:01:09 -0700
Received: by gxk6 with SMTP id 6sf253723gxk.16
        for <multiple recipients>; Thu, 14 Oct 2010 17:00:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:received:received
         :received:received-spf:received:mime-version:received:received
         :in-reply-to:references:date:message-id:subject:from:to
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe:content-type;
        bh=DfFo9JbVWzSwAgD9G6+C1LK6nMgriWvmxFX0KsK6DOc=;
        b=d63fHZZNkM+rv6uHF+T4vnYhBs5Eu2+N8m4yB4xm3aBCm9S0Lm/duTeBZ8T21+OTHW
         7iLibyq5aJ98hgnMHdy5J52St5yf1DLy1YpGAgqldSL0MR3Fq0eMgRT8JEeZWpRWNmpH
         xIgLDQRHaUE/N8SLpAG0Uh6LqjBIP+vTiviL8=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:received-spf:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:x-original-sender
         :x-original-authentication-results:reply-to:precedence:mailing-list
         :list-id:list-post:list-help:list-archive:sender:list-subscribe
         :list-unsubscribe:content-type;
        b=AJ5pzhiw0iPzhpTTMPUaJGflaF/lW5s6KIBxwpsUR2OjAOosp4Cu8mSzmMT7/0mmgf
         8UdhNyF4DWHMy+ZxMaqkGcHI/fK0ABKizEYbqZk7PwFwkvIvA3Bk2Ow9kuvgV1YPKDZZ
         0IqK842Rg1knRNxcRel1/s4A87/0BLtVN4OW8=
Received: by 10.236.95.130 with SMTP id p2mr239171yhf.26.1287100850981;
        Thu, 14 Oct 2010 17:00:50 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.231.180.73 with SMTP id bt9ls1957147ibb.0.p; Thu, 14 Oct 2010
 17:00:50 -0700 (PDT)
Received: by 10.231.183.7 with SMTP id ce7mr4061152ibb.7.1287100850299;
        Thu, 14 Oct 2010 17:00:50 -0700 (PDT)
Received: by 10.231.183.7 with SMTP id ce7mr4061149ibb.7.1287100850235;
        Thu, 14 Oct 2010 17:00:50 -0700 (PDT)
Received: from mail-iw0-f177.google.com (mail-iw0-f177.google.com [209.85.214.177])
        by gmr-mx.google.com with ESMTP id bm7si7127166ibb.6.2010.10.14.17.00.49;
        Thu, 14 Oct 2010 17:00:49 -0700 (PDT)
Received-SPF: pass (google.com: domain of eyeonus@gmail.com designates 209.85.214.177 as permitted sender) client-ip=209.85.214.177;
Received: by iwn7 with SMTP id 7so219668iwn.36
        for <lojban@googlegroups.com>; Thu, 14 Oct 2010 17:00:49 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.42.246.130 with SMTP id ly2mr780758icb.167.1287100419557; Thu,
 14 Oct 2010 16:53:39 -0700 (PDT)
Received: by 10.231.206.68 with HTTP; Thu, 14 Oct 2010 16:53:39 -0700 (PDT)
In-Reply-To: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com>
References: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com>
Date: Thu, 14 Oct 2010 17:53:39 -0600
Message-ID: <AANLkTinzf+_jsQ82aFkQRM71dyZ-7Ji4++curTGu_=0Z@mail.gmail.com>
Subject: Re: [lojban] Questions on isolating utterances before completely parsing
From: Jonathan Jones <eyeonus@gmail.com>
To: lojban@googlegroups.com
X-Original-Sender: eyeonus@gmail.com
X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com:
 domain of eyeonus@gmail.com designates 209.85.214.177 as permitted sender)
 smtp.mail=eyeonus@gmail.com; dkim=pass (test mode) header.i=@gmail.com
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: multipart/alternative; boundary=90e6ba6e871e4c4ada04929c6bde

--90e6ba6e871e4c4ada04929c6bde
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 14, 2010 at 5:13 PM, symuyn <rbysamppi@gmail.com> wrote:

> I've got a hypothetical problem. It's pretty long, but please bear
> with me.
>
> Let's say that, hypothetically, someone is creating a text editor for
> Lojban, one which shows the syntactical structure of whatever you've
> typed *while you type*. The text would be displayed somewhat like
> this:
>
>   =E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=B9klama bo kl=
ama=E2=80=BA=E2=80=BA=E2=80=BA
>
> Let's also imagine, hypothetically, that this person has made the
> editor pre-parse all whitespace/dot-separated chunks of text into the
> valsi that the chunks correspond to, identifying their selma'o and all
> that (e.g. "melo" =E2=86=92 [<"me" in ME> <"lo" in LE>]). This is before
> checking the grammar of the text.
>
> So this hypothetical text editor uses two parsers right now: a chunks-
> of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures
> parser.
>
> Let's also say that, hypothetically, in testing this text editor, that
> this person encountered a problem.
>
> The hypothetical text editor becomes slower and slower when the text
> grows in size. This is because, unfortunately, the entire text has to
> be parsed whenever a new word is added or existing text is deleted.
>
> What to do? The person hypothetically comes up with an idea! There
> could be a *third* parser between the already existing two parsers,
> one that converts sequences of valsi into unparsed utterances! The
> third parser would ignore everything except I, NIhO, LU, LIhU, TO,
> TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed
> utterances.
>
> For instance, the third parser would convert the sequence of valsi [i
> cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu
> [[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]].
>
> Therefore, with this new parser, the hypothetical editor can keep
> track of what the boundaries of the utterance *currently being edited*
> is, and re-parse *only the current utterance* when it's edited.
>
> But then, the person finds a problem with that solution! A fatal flaw:
> *LIhU, TOI, and TUhE are elidable*.
>
> Because of that, it seems that it's impossible to isolate an utterance
> from the text it is in without parsing the whole text for complete
> grammar.
>
> That's the end of the hypothetical situation. My questions are as
> following:
>
> * Is it true that the fact that LIhU, TOI, and TUhE are elidable makes
> isolating an utterance impossible without completely parsing the text
> the utterance is in? (Just making sure.)
>

I'm not entirely sure what enables those to be elided, but I believe that
such cases are rare, like only-at-the-end-of-text rare. Also, there are
various people, me, .xorxes., possibly others I don't know, who feel that
they should /never/ be elidable anyway.

Based on that, and the fact that it's expected the user is going to be
typing more, it's reasonable to assume for the sake of as-you-type parsing,
they aren't elided if they aren't in the text, as it's assumed that the end
of current input is not the end of text.

In something like {lu ko'a broda to brodi ko'e li'u}, the {li'u} marks the
end of the quoted text, so you'd have to allow for that....


> * Should the person make the third parser anyway while making LIhU,
> TOI, and TUhE *required and non-elidable*?
>

I say yes, but since that's not official, I should say no. Then again, if
the third parser /assumes/ non-elidability, I doubt it will cause problems.

Alternatively, you can cause the third parser to assume current-end-of-inpu=
t
is always equal to terminate-everything-unterminated, and that should work
out fine.


> * Is there another practical solution for the editor?
>

.alyn.'s idea sounds pretty good to me.


> Remember, the problem is that the hypothetical text editor is getting
> slow because otherwise it needs to parse the entire text for every
> edit.
>

Something tells me this "hypothetical" parser isn't very hypothetical. :D

--=20
mu'o mi'e .aionys.

.i.a'o.e'e ko cmima le bende pe lo pilno be denpa bu .i doi.luk. mi patfu d=
o
zo'o
(Come to the Dot Side! Luke, I am your father. :D )

--=20
You received this message because you are subscribed to the Google Groups "=
lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.
For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.


--90e6ba6e871e4c4ada04929c6bde
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">On Thu, Oct 14, 2010 at 5:13 PM, symuyn <span di=
r=3D"ltr">&lt;<a href=3D"mailto:rbysamppi@gmail.com">rbysamppi@gmail.com</a=
>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0=
pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: =
1ex;">
I&#39;ve got a hypothetical problem. It&#39;s pretty long, but please bear<=
br>
with me.<br>
<br>
Let&#39;s say that, hypothetically, someone is creating a text editor for<b=
r>
Lojban, one which shows the syntactical structure of whatever you&#39;ve<br=
>
typed *while you type*. The text would be displayed somewhat like<br>
this:<br>
<br>
 =C2=A0 =E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=B9klama b=
o klama=E2=80=BA=E2=80=BA=E2=80=BA<br>
<br>
Let&#39;s also imagine, hypothetically, that this person has made the<br>
editor pre-parse all whitespace/dot-separated chunks of text into the<br>
valsi that the chunks correspond to, identifying their selma&#39;o and all<=
br>
that (e.g. &quot;melo&quot; =E2=86=92 [&lt;&quot;me&quot; in ME&gt; &lt;&qu=
ot;lo&quot; in LE&gt;]). This is before<br>
checking the grammar of the text.<br>
<br>
So this hypothetical text editor uses two parsers right now: a chunks-<br>
of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures<br>
parser.<br>
<br>
Let&#39;s also say that, hypothetically, in testing this text editor, that<=
br>
this person encountered a problem.<br>
<br>
The hypothetical text editor becomes slower and slower when the text<br>
grows in size. This is because, unfortunately, the entire text has to<br>
be parsed whenever a new word is added or existing text is deleted.<br>
<br>
What to do? The person hypothetically comes up with an idea! There<br>
could be a *third* parser between the already existing two parsers,<br>
one that converts sequences of valsi into unparsed utterances! The<br>
third parser would ignore everything except I, NIhO, LU, LIhU, TO,<br>
TOI, TUhE, and TUhU, using those selma&#39;o to create a tree of unparsed<b=
r>
utterances.<br>
<br>
For instance, the third parser would convert the sequence of valsi [i<br>
cusku lu klama i klama li&#39;u to mi cusku toi i cusku] into [[i cusku lu<=
br>
[[klama] [i klama]] li&#39;u to [mi cusku] toi] [i cusku]].<br>
<br>
Therefore, with this new parser, the hypothetical editor can keep<br>
track of what the boundaries of the utterance *currently being edited*<br>
is, and re-parse *only the current utterance* when it&#39;s edited.<br>
<br>
But then, the person finds a problem with that solution! A fatal flaw:<br>
*LIhU, TOI, and TUhE are elidable*.<br>
<br>
Because of that, it seems that it&#39;s impossible to isolate an utterance<=
br>
from the text it is in without parsing the whole text for complete<br>
grammar.<br>
<br>
That&#39;s the end of the hypothetical situation. My questions are as<br>
following:<br>
<br>
* Is it true that the fact that LIhU, TOI, and TUhE are elidable makes<br>
isolating an utterance impossible without completely parsing the text<br>
the utterance is in? (Just making sure.)<br></blockquote><div><br>I&#39;m n=
ot entirely sure what enables those to be elided, but I believe that such c=
ases are rare, like only-at-the-end-of-text rare. Also, there are various p=
eople, me, .xorxes., possibly others I don&#39;t know, who feel that they s=
hould /never/ be elidable anyway.<br>
<br>Based on that, and the fact that it&#39;s expected the user is going to=
 be typing more, it&#39;s reasonable to assume for the sake of as-you-type =
parsing, they aren&#39;t elided if they aren&#39;t in the text, as it&#39;s=
 assumed that the end of current input is not the end of text.<br>
<br>In something like {lu ko&#39;a broda to brodi ko&#39;e li&#39;u}, the {=
li&#39;u} marks the end of the quoted text, so you&#39;d have to allow for =
that....<br>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin: =
0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left:=
 1ex;">

* Should the person make the third parser anyway while making LIhU,<br>
TOI, and TUhE *required and non-elidable*?<br></blockquote><div><br>I say y=
es, but since that&#39;s not official, I should say no. Then again, if the =
third parser /assumes/ non-elidability, I doubt it will cause problems.<br>
<br>Alternatively, you can cause the third parser to assume current-end-of-=
input is always equal to terminate-everything-unterminated, and that should=
 work out fine.<br>=C2=A0<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); p=
adding-left: 1ex;">

* Is there another practical solution for the editor?<br></blockquote><div>=
<br>.alyn.&#39;s idea sounds pretty good to me.<br>=C2=A0</div><blockquote =
class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px =
solid rgb(204, 204, 204); padding-left: 1ex;">


Remember, the problem is that the hypothetical text editor is getting<br>
slow because otherwise it needs to parse the entire text for every<br>
edit.<br></blockquote></div><br>Something tells me this &quot;hypothetical&=
quot; parser isn&#39;t very hypothetical. :D<br clear=3D"all"><br>-- <br>mu=
&#39;o mi&#39;e .aionys.<br><br>.i.a&#39;o.e&#39;e ko cmima le bende pe lo =
pilno be denpa bu .i doi.luk. mi patfu do zo&#39;o<br>
(Come to the Dot Side! Luke, I am your father. :D )<br><br>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "=
lojban" group.<br />
To post to this group, send email to lojban@googlegroups.com.<br />
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.<br />

For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.<br />



--90e6ba6e871e4c4ada04929c6bde--