From nobody@digitalkingdom.org Thu Nov 06 00:52:35 2008
Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 06 Nov 2008 00:52:35 -0800 (PST)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69)	(envelope-from <nobody@digitalkingdom.org>)	id 1Ky0bT-000732-95	for lojban-list-real@lojban.org; Thu, 06 Nov 2008 00:52:35 -0800
Received: from fk-out-0910.google.com ([209.85.128.184])	by chain.digitalkingdom.org with esmtp (Exim 4.69)	(envelope-from <dbrockman@gmail.com>)	id 1Ky0bN-00072W-T1	for lojban-list@lojban.org; Thu, 06 Nov 2008 00:52:35 -0800
Received: by fk-out-0910.google.com with SMTP id 18so547974fks.2        for <lojban-list@lojban.org>; Thu, 06 Nov 2008 00:52:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=gamma;        h=domainkey-signature:received:received:message-id:date:from:sender         :to:subject:in-reply-to:mime-version:content-type:references         :x-google-sender-auth;        bh=ssOkSuiJ1zpxEz3ulLYZNTRFMx3Amxe6h4Y9uCwAQA0=;        b=Zf7LTY5N9aFqN7SoypR04EmA1tag3SV4IjaOPZhJPalVqZrNvZf1jxIfsqZnrBv0K+         6GuF084ftRS43yXL8A8Jg9+zVfehQ04RVE2VnMaXbvXAs6YlR5PUZcs1zUo6rpNwRtng         SAAnNufzlfSJiIHaZFYeilPspXpAzI6QKvKKI=
DomainKey-Signature: a=rsa-sha1; c=nofws;        d=gmail.com; s=gamma;        h=message-id:date:from:sender:to:subject:in-reply-to:mime-version         :content-type:references:x-google-sender-auth;        b=UBO9N6LcNh/3fLbsoqrMC3dcIXd3t62Ac49JtISLnMwx2nybOI7D0ilxccOrpOzGKZ         kyHswOyO4w7hGVT/2buQgYAnqh554J/8jP9WnqfkhCtV2tZTQKhi8wzkNzfM470Bza1G         6h3FKKXDDA4mHBMmvdFwhMVWFi9yy7nav/YW8=
Received: by 10.180.203.3 with SMTP id a3mr572874bkg.146.1225961548065;        Thu, 06 Nov 2008 00:52:28 -0800 (PST)
Received: by 10.181.1.5 with HTTP; Thu, 6 Nov 2008 00:52:28 -0800 (PST)
Message-ID: <a36d16c80811060052l46e4ca70o2c7bf28ba573e2bd@mail.gmail.com>
Date: Thu, 6 Nov 2008 09:52:28 +0100
From: "Daniel Brockman" <daniel@brockman.se>
To: lojban-list@lojban.org
Subject: [lojban] Re: experimental cmavo in lojgloss.
In-Reply-To: <737b61f30811051630t6adad5e0x54456e789d70c5b@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_23553_17015168.1225961548079"
References: <737b61f30811022128n9e8692evefaa820062d2a652@mail.gmail.com>	 <925d17560811031040t402eb7a9k31e0d61bf7ca3cea@mail.gmail.com>	 <a36d16c80811040135yaee90cfg66e6e510e0e5302a@mail.gmail.com>	 <925d17560811040350g2a04db8ewd2f34a8a43d96767@mail.gmail.com>	 <a36d16c80811040536t6d7dbf8eh3907bfcfe6e492bd@mail.gmail.com>	 <737b61f30811041523o3574936fp27dea91b6a058c26@mail.gmail.com>	 <a36d16c80811050158h616ec382x404d43a69a5063d4@mail.gmail.com>	 <737b61f30811050534i514b3fddv197b2a07a47655f9@mail.gmail.com>	 <a36d16c80811051607h648fee69t44ddf4c84a5f019f@mail.gmail.com>	 <737b61f30811051630t6adad5e0x54456e789d70c5b@mail.gmail.com>
X-Google-Sender-Auth: c95a46519b37b96f
X-Spam-Score: 0.0
X-Spam-Score-Int: 0
X-Spam-Bar: /
X-archive-position: 14933
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: daniel@brockman.se
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

------=_Part_23553_17015168.1225961548079
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Thu, Nov 6, 2008 at 1:30 AM, Chris Capel <pdf23ds@gmail.com> wrote:

> On Wed, Nov 5, 2008 at 18:07, Daniel Brockman <daniel@gointeractive.se>
> wrote:
> >> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is
> to
> >> > just treat it as a self-contained construct that requires
> >> > morphologically
> >> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
> >> > correct Lojban before it (just like everything else).
> >>
> >> How far before it? Up to the beginning of the sentence? The statement?
> >
> > The {le'ai} construct doesn't care about ANYTHING else.  However your
> parser
> > works, that's how it works before {le'ai}.
>
> I don't understand. You're saying that if there's a lo'ai then
> everything before it in the text should get only a syntactical parse,
> not a grammatical parse? If not, there has to be some cutoff.


Syntax and grammar is one and the same thing to me, so I don't understand
the distinction.


> >> > Of course it would require extraordinary methods to get things like
> >> > {kwama
> >> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n
> >> > su
> >> > coi} --- to parse.  It's not practical and not cost-efficient.  The
> >> > {kjama}
> >> > example falls in this category because {kj} is morphologically
> invalid.
> >>
> >> Hmm. I think you overestimate the difference in effort between the two
> >> implementations. They both require the same tricks, just at a slightly
> >> different level in the grammar.
> >
> > What are you talking about?  One implementation is self-contained; the
> other
> > requires lots of weird backtracking and re-parsing and weird, weird
> stuff.
>
> No, both require backtracking (but not reparsing, since this is a
> packrat parser) and lots of lookahead that's usually wasted (but
> hopefully fast). You have to check every sentence (or whatever) for
> lo'ai before the main grammar parse, whether you do it before or after
> the morph parse. If you want to see how that's implemented, take a
> look at SA. Now, SA has a lot more complicated grammar, so lo'ai would
> be easier to implement even using the same technique. (And contrary to
> Jorge, I'm not too sure it would introduce any weird interactions with
> the SA machinery.)


I'm still not getting through.  We are talking about two different things.


> > It doesn't matter if it has the same parse tree.  It only matters that it
> > PARSES IN ANY WAY.  If it does, then the parser will be able to continue.
> > If it doesn't, then the parser will die.
>
> I'm more concerned about interactive parsing where parse errors aren't
> a huge deal, especially because you get detailed and helpful error
> information, much, much better than jbofi'e, to help you find the
> problem.
>
> I think perhaps a better (simple) way to handle lo'ai is to treat it
> similar to a plain-old lo'u - le'u quote. Still have it behave like a
> UI, but only morph parse the words until the le'ai. In fact, I imagine
> a number of experimental cmavo that create new selmaho could be
> handled cursorily as quotes of this kind. It's not ideal, but it
> allows a non-expert user to modify the parser with configuration to
> handle text using these cmavo better than before.
>

Yes, this is what I've been trying to say.  Thank you.  Just handle it like
a parenthetical expression.

The more complicated implementation that actually replaces at parse time is
another discussion (which I've been trying to avoid in order to keep this
simple, but by all means continue if it is interesting to you).

I'm not even sure I'd want my parser to erase and replace stuff.  I consider
an erasure or a replacement to be an additional utterance that is often best
understood as such.  It would even be interesting to make a parser that
could parse through errors and resynchronize later (e.g., when {.i} is
encountered), and things like that.

Anyway, I'm in over my head.  I'm not a parser expert.

-- 
Daniel Brockman
daniel@brockman.se

------=_Part_23553_17015168.1225961548079
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<br><br><div class="gmail_quote">On Thu, Nov 6, 2008 at 1:30 AM, Chris Capel <span dir="ltr">&lt;<a href="mailto:pdf23ds@gmail.com">pdf23ds@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="Ih2E3d">On Wed, Nov 5, 2008 at 18:07, Daniel Brockman &lt;<a href="mailto:daniel@gointeractive.se">daniel@gointeractive.se</a>&gt; wrote:<br>
&gt;&gt; &gt; The obvious way to implement {lo&#39;ai .. sa&#39;ai .. le&#39;ai} in a parser is to<br>
&gt;&gt; &gt; just treat it as a self-contained construct that requires<br>
&gt;&gt; &gt; morphologically<br>
&gt;&gt; &gt; correct Lojban inside it, just like {lo&#39;u .. le&#39;u&#39;}, and syntactically<br>
&gt;&gt; &gt; correct Lojban before it (just like everything else).<br>
&gt;&gt;<br>
&gt;&gt; How far before it? Up to the beginning of the sentence? The statement?<br>
&gt;<br>
&gt; The {le&#39;ai} construct doesn&#39;t care about ANYTHING else. &nbsp;However your parser<br>
&gt; works, that&#39;s how it works before {le&#39;ai}.<br>
<br>
</div>I don&#39;t understand. You&#39;re saying that if there&#39;s a lo&#39;ai then<br>
everything before it in the text should get only a syntactical parse,<br>
not a grammatical parse? If not, there has to be some cutoff.</blockquote><div><br></div><div>Syntax and grammar is one and the same thing to me, so I don&#39;t understand the distinction.</div><div>&nbsp;</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="Ih2E3d">
&gt;&gt; &gt; Of course it would require extraordinary methods to get things like<br>
&gt;&gt; &gt; {kwama<br>
&gt;&gt; &gt; lo&#39;ai kwama sa&#39;ai klama le&#39;ai} --- or why not {fsen.45ynl5tnerg98ehg4n<br>
&gt;&gt; &gt; su<br>
&gt;&gt; &gt; coi} --- to parse. &nbsp;It&#39;s not practical and not cost-efficient. &nbsp;The<br>
&gt;&gt; &gt; {kjama}<br>
&gt;&gt; &gt; example falls in this category because {kj} is morphologically invalid.<br>
&gt;&gt;<br>
&gt;&gt; Hmm. I think you overestimate the difference in effort between the two<br>
&gt;&gt; implementations. They both require the same tricks, just at a slightly<br>
&gt;&gt; different level in the grammar.<br>
&gt;<br>
&gt; What are you talking about? &nbsp;One implementation is self-contained; the other<br>
&gt; requires lots of weird backtracking and re-parsing and weird, weird stuff.<br>
<br>
</div>No, both require backtracking (but not reparsing, since this is a<br>
packrat parser) and lots of lookahead that&#39;s usually wasted (but<br>
hopefully fast). You have to check every sentence (or whatever) for<br>
lo&#39;ai before the main grammar parse, whether you do it before or after<br>
the morph parse. If you want to see how that&#39;s implemented, take a<br>
look at SA. Now, SA has a lot more complicated grammar, so lo&#39;ai would<br>
be easier to implement even using the same technique. (And contrary to<br>
Jorge, I&#39;m not too sure it would introduce any weird interactions with<br>
the SA machinery.)</blockquote><div><br></div><div>I&#39;m still not getting through. &nbsp;We are talking about two different things.</div><div>&nbsp;</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="Ih2E3d">
&gt; It doesn&#39;t matter if it has the same parse tree. &nbsp;It only matters that it<br>
&gt; PARSES IN ANY WAY. &nbsp;If it does, then the parser will be able to continue.<br>
&gt; If it doesn&#39;t, then the parser will die.<br>
<br>
</div>I&#39;m more concerned about interactive parsing where parse errors aren&#39;t<br>
a huge deal, especially because you get detailed and helpful error<br>
information, much, much better than jbofi&#39;e, to help you find the<br>
problem.<br>
<br>
I think perhaps a better (simple) way to handle lo&#39;ai is to treat it<br>
similar to a plain-old lo&#39;u - le&#39;u quote. Still have it behave like a<br>
UI, but only morph parse the words until the le&#39;ai. In fact, I imagine<br>
a number of experimental cmavo that create new selmaho could be<br>
handled cursorily as quotes of this kind. It&#39;s not ideal, but it<br>
allows a non-expert user to modify the parser with configuration to<br>
handle text using these cmavo better than before.<br>
<div><div></div><div class="Wj3C7c"></div></div></blockquote><div><br></div><div>Yes, this is what I&#39;ve been trying to say. &nbsp;Thank you. &nbsp;Just handle it like a parenthetical expression.</div><div><br></div><div>The more complicated implementation that actually replaces at parse time is another discussion (which I&#39;ve been trying to avoid in order to keep this simple, but by all means continue if it is interesting to you).</div>
<div><br></div><div>I&#39;m not even sure I&#39;d want my parser to erase and replace stuff. &nbsp;I consider an erasure or a replacement to be an additional utterance that is often best understood as such. &nbsp;It would even be interesting to make a parser that could parse through errors and resynchronize later (e.g., when {.i} is encountered), and things like that.</div>
<div><br></div><div>Anyway, I&#39;m in over my head. &nbsp;I&#39;m not a parser expert.</div><div><br></div></div>-- <br>Daniel Brockman<br><a href="mailto:daniel@brockman.se">daniel@brockman.se</a><br><br><br>

------=_Part_23553_17015168.1225961548079--


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.