Re: [lojban] Re: Questions on isolating utterances before completely parsing

On Sat, Oct 16, 2010 at 12:57 PM, .alyn.post. <alyn.post@lodockikumazvati.org> wrote:

On Sat, Oct 16, 2010 at 09:46:00AM -0700, symuyn wrote:
> In reply to Mr. Post, saving and using continuations is a very
> interesting idea, but unfortunately, I don't see how it would be
> practically usable when it comes to editing near the beginning—or even
> middle!—of the document. Hypothetically, if you have a long document,
> editing it even in the middle would take a long time to process for
> each re-parse.
>
> The two points that you give at the end to ameliorate continuations'
> problems are interesting but very difficult, as far as I can tell.
> Perhaps you can give some answers—
>
> Providing feedback during parsing of text downstream of the editing is
> impossible as far as I can tell—every PEG library I know—including the
> ones that I've written—is a sealed black box: once you plug something
> in, you must wait until it finishes getting the result out.
>

The PEG parser I'm using requires you to write a generator for token
input, which is the first place I'd try putting continuations:

http://gazette.call-cc.org/issues/5.html

Meaning, I'd manage my continuations on the input side, rather than
the output side, because you're right--stuff pops out fully formed.

I suspect I'd have to hack at the parser some after this to make
continuations work as I expected, but I already expected I'd be
learning and fixing this particular parser anyway.

If you can save to and seek back to the input position for a
particular parse, save the state variables of the parser, and save
the syntax tree, it really should be possible.

While I have extensive experience with parsers, I have very little
experience with PEG parsers, so I accept that something about these
parsers may make that difficult. I wouldn't try to do something
like this with a recursive descent parser, because it saves state on
the stack. I might be motivated enough to convert a recursive descent
parser into a continuation passing style parser, which would allow
me to save the stack-based representation of the parse on the heap
if I needed to create a continuation.

> Comparing parse trees and stopping re-parsing when they're
> sufficiently similar is risky, if there is no way to guarantee that
> the syntax tree is exactly the same all the way to the end *without re-
> parsing the whole thing anyway*. As far as I can tell, just because a
> new parse tree starts to look similar to the original tree, the new
> parse tree is not necessarily identical till the end. (Or is that
> actually a property of the Lojban grammar? If it is, only then should
> early stopping by comparison be used.)
>

You're correct. Sufficiently similar is a heuristic that won't work
for all cases. I was suggesting that this trade-off was better than
the other suggested trade-offs for solving this problem. As you're
the one writing this thing, you get to decide which trade-off you
want to deal with. ;-)

What I was thinking is the the parser would run as a separate
thread, and the parse tree in the main thread would contain in it a
marshall object at the current parse location. Occassionally this
marshall object would receive more of the parse tree and a new
marshall object, replacing the old marshall object with the new bit
of parse tree and the place it was still working.

I wasn't thinking of the "all-or-nothing" property of PEG parsers,
as this idea does require the PEG parser to check back in with the
caller from time to time.

> However, continuations *are* indeed the only way that I can now think
> of to implement practically such a text editor *without* requiring
> LIhU, etc. to be at the end of multi-utterance texts.
>

I don't personally elide LIhU, LUhU, TOI, &c myself.

I know that Eclipse can parse python (and presumably other languages)
in its editor. I also know that this parser is not identical to the
way that python parsers itself, as my coworkers have complained to
me that Eclipse flagged a perfectly valid python construct as
invalid when in fact python did something quite useful with the same
construct.

This doesn't stop people from using Eclipse to write python, and
serves as prior art on how people cope (and programs don't cope) with
this sort of thing.

In our case we add little tokens in the code that Eclipse recognizes
but python doesn't, so Eclipse will shut up and python will work.

I am curious to hear how this project goes for you. What platform
are you targeting? What language are you using? What PEG parser
are you trying? No one has tried doing anything like this with
Lojban before, which makes me quite excited.

-Alan
--
.i ko djuno fi le do sevzi