Re: [lojban] Questions on isolating utterances before completely parsing

On Thu, Oct 14, 2010 at 5:13 PM, symuyn <rbysamppi@gmail.com> wrote:

I've got a hypothetical problem. It's pretty long, but please bear
with me.

Let's say that, hypothetically, someone is creating a text editor for
Lojban, one which shows the syntactical structure of whatever you've
typed *while you type*. The text would be displayed somewhat like
this:

‹mi ‹‹klama klama› ‹klama bo klama›››

Let's also imagine, hypothetically, that this person has made the
editor pre-parse all whitespace/dot-separated chunks of text into the
valsi that the chunks correspond to, identifying their selma'o and all
that (e.g. "melo" → [<"me" in ME> <"lo" in LE>]). This is before
checking the grammar of the text.

So this hypothetical text editor uses two parsers right now: a chunks-
of-text-to-valsi parser and a sequence-of-valsi-to-textual-structures
parser.

Let's also say that, hypothetically, in testing this text editor, that
this person encountered a problem.

The hypothetical text editor becomes slower and slower when the text
grows in size. This is because, unfortunately, the entire text has to
be parsed whenever a new word is added or existing text is deleted.

What to do? The person hypothetically comes up with an idea! There
could be a *third* parser between the already existing two parsers,
one that converts sequences of valsi into unparsed utterances! The
third parser would ignore everything except I, NIhO, LU, LIhU, TO,
TOI, TUhE, and TUhU, using those selma'o to create a tree of unparsed
utterances.

For instance, the third parser would convert the sequence of valsi [i
cusku lu klama i klama li'u to mi cusku toi i cusku] into [[i cusku lu
[[klama] [i klama]] li'u to [mi cusku] toi] [i cusku]].

Therefore, with this new parser, the hypothetical editor can keep
track of what the boundaries of the utterance *currently being edited*
is, and re-parse *only the current utterance* when it's edited.

But then, the person finds a problem with that solution! A fatal flaw:
*LIhU, TOI, and TUhE are elidable*.

Because of that, it seems that it's impossible to isolate an utterance
from the text it is in without parsing the whole text for complete
grammar.

That's the end of the hypothetical situation. My questions are as
following:

* Is it true that the fact that LIhU, TOI, and TUhE are elidable makes
isolating an utterance impossible without completely parsing the text
the utterance is in? (Just making sure.)

I'm not entirely sure what enables those to be elided, but I believe that such cases are rare, like only-at-the-end-of-text rare. Also, there are various people, me, .xorxes., possibly others I don't know, who feel that they should /never/ be elidable anyway.

Based on that, and the fact that it's expected the user is going to be typing more, it's reasonable to assume for the sake of as-you-type parsing, they aren't elided if they aren't in the text, as it's assumed that the end of current input is not the end of text.

In something like {lu ko'a broda to brodi ko'e li'u}, the {li'u} marks the end of the quoted text, so you'd have to allow for that....

* Should the person make the third parser anyway while making LIhU,
TOI, and TUhE *required and non-elidable*?

I say yes, but since that's not official, I should say no. Then again, if the third parser /assumes/ non-elidability, I doubt it will cause problems.

Alternatively, you can cause the third parser to assume current-end-of-input is always equal to terminate-everything-unterminated, and that should work out fine.

* Is there another practical solution for the editor?

.alyn.'s idea sounds pretty good to me.

Remember, the problem is that the hypothetical text editor is getting
slow because otherwise it needs to parse the entire text for every
edit.