From lojban+bncCMHEmaCOBhCys97lBBoEOMWN2Q@googlegroups.com Thu Oct 14 17:01:09 2010 Received: from mail-gx0-f189.google.com ([209.85.161.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1P6XjN-00012h-C5; Thu, 14 Oct 2010 17:01:09 -0700 Received: by gxk6 with SMTP id 6sf253723gxk.16 for ; Thu, 14 Oct 2010 17:00:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:mime-version:received:received :in-reply-to:references:date:message-id:subject:from:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=DfFo9JbVWzSwAgD9G6+C1LK6nMgriWvmxFX0KsK6DOc=; b=d63fHZZNkM+rv6uHF+T4vnYhBs5Eu2+N8m4yB4xm3aBCm9S0Lm/duTeBZ8T21+OTHW 7iLibyq5aJ98hgnMHdy5J52St5yf1DLy1YpGAgqldSL0MR3Fq0eMgRT8JEeZWpRWNmpH xIgLDQRHaUE/N8SLpAG0Uh6LqjBIP+vTiviL8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; b=AJ5pzhiw0iPzhpTTMPUaJGflaF/lW5s6KIBxwpsUR2OjAOosp4Cu8mSzmMT7/0mmgf 8UdhNyF4DWHMy+ZxMaqkGcHI/fK0ABKizEYbqZk7PwFwkvIvA3Bk2Ow9kuvgV1YPKDZZ 0IqK842Rg1knRNxcRel1/s4A87/0BLtVN4OW8= Received: by 10.236.95.130 with SMTP id p2mr239171yhf.26.1287100850981; Thu, 14 Oct 2010 17:00:50 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.231.180.73 with SMTP id bt9ls1957147ibb.0.p; Thu, 14 Oct 2010 17:00:50 -0700 (PDT) Received: by 10.231.183.7 with SMTP id ce7mr4061152ibb.7.1287100850299; Thu, 14 Oct 2010 17:00:50 -0700 (PDT) Received: by 10.231.183.7 with SMTP id ce7mr4061149ibb.7.1287100850235; Thu, 14 Oct 2010 17:00:50 -0700 (PDT) Received: from mail-iw0-f177.google.com (mail-iw0-f177.google.com [209.85.214.177]) by gmr-mx.google.com with ESMTP id bm7si7127166ibb.6.2010.10.14.17.00.49; Thu, 14 Oct 2010 17:00:49 -0700 (PDT) Received-SPF: pass (google.com: domain of eyeonus@gmail.com designates 209.85.214.177 as permitted sender) client-ip=209.85.214.177; Received: by iwn7 with SMTP id 7so219668iwn.36 for ; Thu, 14 Oct 2010 17:00:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.246.130 with SMTP id ly2mr780758icb.167.1287100419557; Thu, 14 Oct 2010 16:53:39 -0700 (PDT) Received: by 10.231.206.68 with HTTP; Thu, 14 Oct 2010 16:53:39 -0700 (PDT) In-Reply-To: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com> References: <385d6b2f-c484-494b-9241-6d7429ce0ec3@p20g2000prf.googlegroups.com> Date: Thu, 14 Oct 2010 17:53:39 -0600 Message-ID: Subject: Re: [lojban] Questions on isolating utterances before completely parsing From: Jonathan Jones To: lojban@googlegroups.com X-Original-Sender: eyeonus@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of eyeonus@gmail.com designates 209.85.214.177 as permitted sender) smtp.mail=eyeonus@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=90e6ba6e871e4c4ada04929c6bde --90e6ba6e871e4c4ada04929c6bde Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu, Oct 14, 2010 at 5:13 PM, symuyn wrote: > I've got a hypothetical problem. It's pretty long, but please bear > with me. > > Let's say that, hypothetically, someone is creating a text editor for > Lojban, one which shows the syntactical structure of whatever you've > typed *while you type*. The text would be displayed somewhat like > this: > > =E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=B9klama bo kl= ama=E2=80=BA=E2=80=BA=E2=80=BA > > Let's also imagine, hypothetically, that this person has made the > editor pre-parse all whitespace/dot-separated chunks of text into the > valsi that the chunks correspond to, identifying their selma'o and all > that (e.g. "melo" =E2=86=92 [<"me" in ME> <"lo" in LE>]). This is before > checking the grammar of the text. > > So this hypothetical text editor uses two parsers right now: a chunks- > of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures > parser. > > Let's also say that, hypothetically, in testing this text editor, that > this person encountered a problem. > > The hypothetical text editor becomes slower and slower when the text > grows in size. This is because, unfortunately, the entire text has to > be parsed whenever a new word is added or existing text is deleted. > > What to do? The person hypothetically comes up with an idea! There > could be a *third* parser between the already existing two parsers, > one that converts sequences of valsi into unparsed utterances! The > third parser would ignore everything except I, NIhO, LU, LIhU, TO, > TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed > utterances. > > For instance, the third parser would convert the sequence of valsi [i > cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu > [[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]]. > > Therefore, with this new parser, the hypothetical editor can keep > track of what the boundaries of the utterance *currently being edited* > is, and re-parse *only the current utterance* when it's edited. > > But then, the person finds a problem with that solution! A fatal flaw: > *LIhU, TOI, and TUhE are elidable*. > > Because of that, it seems that it's impossible to isolate an utterance > from the text it is in without parsing the whole text for complete > grammar. > > That's the end of the hypothetical situation. My questions are as > following: > > * Is it true that the fact that LIhU, TOI, and TUhE are elidable makes > isolating an utterance impossible without completely parsing the text > the utterance is in? (Just making sure.) > I'm not entirely sure what enables those to be elided, but I believe that such cases are rare, like only-at-the-end-of-text rare. Also, there are various people, me, .xorxes., possibly others I don't know, who feel that they should /never/ be elidable anyway. Based on that, and the fact that it's expected the user is going to be typing more, it's reasonable to assume for the sake of as-you-type parsing, they aren't elided if they aren't in the text, as it's assumed that the end of current input is not the end of text. In something like {lu ko'a broda to brodi ko'e li'u}, the {li'u} marks the end of the quoted text, so you'd have to allow for that.... > * Should the person make the third parser anyway while making LIhU, > TOI, and TUhE *required and non-elidable*? > I say yes, but since that's not official, I should say no. Then again, if the third parser /assumes/ non-elidability, I doubt it will cause problems. Alternatively, you can cause the third parser to assume current-end-of-inpu= t is always equal to terminate-everything-unterminated, and that should work out fine. > * Is there another practical solution for the editor? > .alyn.'s idea sounds pretty good to me. > Remember, the problem is that the hypothetical text editor is getting > slow because otherwise it needs to parse the entire text for every > edit. > Something tells me this "hypothetical" parser isn't very hypothetical. :D --=20 mu'o mi'e .aionys. .i.a'o.e'e ko cmima le bende pe lo pilno be denpa bu .i doi.luk. mi patfu d= o zo'o (Come to the Dot Side! Luke, I am your father. :D ) --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den. --90e6ba6e871e4c4ada04929c6bde Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Thu, Oct 14, 2010 at 5:13 PM, symuyn <rbysamppi@gmail.com> wrote:
I've got a hypothetical problem. It's pretty long, but please bear<= br> with me.

Let's say that, hypothetically, someone is creating a text editor for Lojban, one which shows the syntactical structure of whatever you've typed *while you type*. The text would be displayed somewhat like
this:

=C2=A0 =E2=80=B9mi =E2=80=B9=E2=80=B9klama klama=E2=80=BA =E2=80=B9klama b= o klama=E2=80=BA=E2=80=BA=E2=80=BA

Let's also imagine, hypothetically, that this person has made the
editor pre-parse all whitespace/dot-separated chunks of text into the
valsi that the chunks correspond to, identifying their selma'o and all<= br> that (e.g. "melo" =E2=86=92 [<"me" in ME> <&qu= ot;lo" in LE>]). This is before
checking the grammar of the text.

So this hypothetical text editor uses two parsers right now: a chunks-
of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures
parser.

Let's also say that, hypothetically, in testing this text editor, that<= br> this person encountered a problem.

The hypothetical text editor becomes slower and slower when the text
grows in size. This is because, unfortunately, the entire text has to
be parsed whenever a new word is added or existing text is deleted.

What to do? The person hypothetically comes up with an idea! There
could be a *third* parser between the already existing two parsers,
one that converts sequences of valsi into unparsed utterances! The
third parser would ignore everything except I, NIhO, LU, LIhU, TO,
TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed utterances.

For instance, the third parser would convert the sequence of valsi [i
cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu<= br> [[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]].

Therefore, with this new parser, the hypothetical editor can keep
track of what the boundaries of the utterance *currently being edited*
is, and re-parse *only the current utterance* when it's edited.

But then, the person finds a problem with that solution! A fatal flaw:
*LIhU, TOI, and TUhE are elidable*.

Because of that, it seems that it's impossible to isolate an utterance<= br> from the text it is in without parsing the whole text for complete
grammar.

That's the end of the hypothetical situation. My questions are as
following:

* Is it true that the fact that LIhU, TOI, and TUhE are elidable makes
isolating an utterance impossible without completely parsing the text
the utterance is in? (Just making sure.)

I'm n= ot entirely sure what enables those to be elided, but I believe that such c= ases are rare, like only-at-the-end-of-text rare. Also, there are various p= eople, me, .xorxes., possibly others I don't know, who feel that they s= hould /never/ be elidable anyway.

Based on that, and the fact that it's expected the user is going to= be typing more, it's reasonable to assume for the sake of as-you-type = parsing, they aren't elided if they aren't in the text, as it's= assumed that the end of current input is not the end of text.

In something like {lu ko'a broda to brodi ko'e li'u}, the {= li'u} marks the end of the quoted text, so you'd have to allow for = that....
=C2=A0
* Should the person make the third parser anyway while making LIhU,
TOI, and TUhE *required and non-elidable*?

I say y= es, but since that's not official, I should say no. Then again, if the = third parser /assumes/ non-elidability, I doubt it will cause problems.

Alternatively, you can cause the third parser to assume current-end-of-= input is always equal to terminate-everything-unterminated, and that should= work out fine.
=C2=A0
* Is there another practical solution for the editor?
=
.alyn.'s idea sounds pretty good to me.
=C2=A0
Remember, the problem is that the hypothetical text editor is getting
slow because otherwise it needs to parse the entire text for every
edit.

Something tells me this "hypothetical&= quot; parser isn't very hypothetical. :D

--
mu= 'o mi'e .aionys.

.i.a'o.e'e ko cmima le bende pe lo = pilno be denpa bu .i doi.luk. mi patfu do zo'o
(Come to the Dot Side! Luke, I am your father. :D )

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--90e6ba6e871e4c4ada04929c6bde--