From nobody@digitalkingdom.org Thu Nov 06 00:52:35 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 06 Nov 2008 00:52:35 -0800 (PST) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1Ky0bT-000732-95 for lojban-list-real@lojban.org; Thu, 06 Nov 2008 00:52:35 -0800 Received: from fk-out-0910.google.com ([209.85.128.184]) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1Ky0bN-00072W-T1 for lojban-list@lojban.org; Thu, 06 Nov 2008 00:52:35 -0800 Received: by fk-out-0910.google.com with SMTP id 18so547974fks.2 for ; Thu, 06 Nov 2008 00:52:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type:references :x-google-sender-auth; bh=ssOkSuiJ1zpxEz3ulLYZNTRFMx3Amxe6h4Y9uCwAQA0=; b=Zf7LTY5N9aFqN7SoypR04EmA1tag3SV4IjaOPZhJPalVqZrNvZf1jxIfsqZnrBv0K+ 6GuF084ftRS43yXL8A8Jg9+zVfehQ04RVE2VnMaXbvXAs6YlR5PUZcs1zUo6rpNwRtng SAAnNufzlfSJiIHaZFYeilPspXpAzI6QKvKKI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:references:x-google-sender-auth; b=UBO9N6LcNh/3fLbsoqrMC3dcIXd3t62Ac49JtISLnMwx2nybOI7D0ilxccOrpOzGKZ kyHswOyO4w7hGVT/2buQgYAnqh554J/8jP9WnqfkhCtV2tZTQKhi8wzkNzfM470Bza1G 6h3FKKXDDA4mHBMmvdFwhMVWFi9yy7nav/YW8= Received: by 10.180.203.3 with SMTP id a3mr572874bkg.146.1225961548065; Thu, 06 Nov 2008 00:52:28 -0800 (PST) Received: by 10.181.1.5 with HTTP; Thu, 6 Nov 2008 00:52:28 -0800 (PST) Message-ID: Date: Thu, 6 Nov 2008 09:52:28 +0100 From: "Daniel Brockman" To: lojban-list@lojban.org Subject: [lojban] Re: experimental cmavo in lojgloss. In-Reply-To: <737b61f30811051630t6adad5e0x54456e789d70c5b@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_23553_17015168.1225961548079" References: <737b61f30811022128n9e8692evefaa820062d2a652@mail.gmail.com> <925d17560811031040t402eb7a9k31e0d61bf7ca3cea@mail.gmail.com> <925d17560811040350g2a04db8ewd2f34a8a43d96767@mail.gmail.com> <737b61f30811041523o3574936fp27dea91b6a058c26@mail.gmail.com> <737b61f30811050534i514b3fddv197b2a07a47655f9@mail.gmail.com> <737b61f30811051630t6adad5e0x54456e789d70c5b@mail.gmail.com> X-Google-Sender-Auth: c95a46519b37b96f X-Spam-Score: 0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14933 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: daniel@brockman.se Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list ------=_Part_23553_17015168.1225961548079 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Thu, Nov 6, 2008 at 1:30 AM, Chris Capel wrote: > On Wed, Nov 5, 2008 at 18:07, Daniel Brockman > wrote: > >> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is > to > >> > just treat it as a self-contained construct that requires > >> > morphologically > >> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically > >> > correct Lojban before it (just like everything else). > >> > >> How far before it? Up to the beginning of the sentence? The statement? > > > > The {le'ai} construct doesn't care about ANYTHING else. However your > parser > > works, that's how it works before {le'ai}. > > I don't understand. You're saying that if there's a lo'ai then > everything before it in the text should get only a syntactical parse, > not a grammatical parse? If not, there has to be some cutoff. Syntax and grammar is one and the same thing to me, so I don't understand the distinction. > >> > Of course it would require extraordinary methods to get things like > >> > {kwama > >> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n > >> > su > >> > coi} --- to parse. It's not practical and not cost-efficient. The > >> > {kjama} > >> > example falls in this category because {kj} is morphologically > invalid. > >> > >> Hmm. I think you overestimate the difference in effort between the two > >> implementations. They both require the same tricks, just at a slightly > >> different level in the grammar. > > > > What are you talking about? One implementation is self-contained; the > other > > requires lots of weird backtracking and re-parsing and weird, weird > stuff. > > No, both require backtracking (but not reparsing, since this is a > packrat parser) and lots of lookahead that's usually wasted (but > hopefully fast). You have to check every sentence (or whatever) for > lo'ai before the main grammar parse, whether you do it before or after > the morph parse. If you want to see how that's implemented, take a > look at SA. Now, SA has a lot more complicated grammar, so lo'ai would > be easier to implement even using the same technique. (And contrary to > Jorge, I'm not too sure it would introduce any weird interactions with > the SA machinery.) I'm still not getting through. We are talking about two different things. > > It doesn't matter if it has the same parse tree. It only matters that it > > PARSES IN ANY WAY. If it does, then the parser will be able to continue. > > If it doesn't, then the parser will die. > > I'm more concerned about interactive parsing where parse errors aren't > a huge deal, especially because you get detailed and helpful error > information, much, much better than jbofi'e, to help you find the > problem. > > I think perhaps a better (simple) way to handle lo'ai is to treat it > similar to a plain-old lo'u - le'u quote. Still have it behave like a > UI, but only morph parse the words until the le'ai. In fact, I imagine > a number of experimental cmavo that create new selmaho could be > handled cursorily as quotes of this kind. It's not ideal, but it > allows a non-expert user to modify the parser with configuration to > handle text using these cmavo better than before. > Yes, this is what I've been trying to say. Thank you. Just handle it like a parenthetical expression. The more complicated implementation that actually replaces at parse time is another discussion (which I've been trying to avoid in order to keep this simple, but by all means continue if it is interesting to you). I'm not even sure I'd want my parser to erase and replace stuff. I consider an erasure or a replacement to be an additional utterance that is often best understood as such. It would even be interesting to make a parser that could parse through errors and resynchronize later (e.g., when {.i} is encountered), and things like that. Anyway, I'm in over my head. I'm not a parser expert. -- Daniel Brockman daniel@brockman.se ------=_Part_23553_17015168.1225961548079 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline

On Thu, Nov 6, 2008 at 1:30 AM, Chris Capel <pdf23ds@gmail.com> wrote:
On Wed, Nov 5, 2008 at 18:07, Daniel Brockman <daniel@gointeractive.se> wrote:
>> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to
>> > just treat it as a self-contained construct that requires
>> > morphologically
>> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
>> > correct Lojban before it (just like everything else).
>>
>> How far before it? Up to the beginning of the sentence? The statement?
>
> The {le'ai} construct doesn't care about ANYTHING else.  However your parser
> works, that's how it works before {le'ai}.

I don't understand. You're saying that if there's a lo'ai then
everything before it in the text should get only a syntactical parse,
not a grammatical parse? If not, there has to be some cutoff.

Syntax and grammar is one and the same thing to me, so I don't understand the distinction.
 
>> > Of course it would require extraordinary methods to get things like
>> > {kwama
>> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n
>> > su
>> > coi} --- to parse.  It's not practical and not cost-efficient.  The
>> > {kjama}
>> > example falls in this category because {kj} is morphologically invalid.
>>
>> Hmm. I think you overestimate the difference in effort between the two
>> implementations. They both require the same tricks, just at a slightly
>> different level in the grammar.
>
> What are you talking about?  One implementation is self-contained; the other
> requires lots of weird backtracking and re-parsing and weird, weird stuff.

No, both require backtracking (but not reparsing, since this is a
packrat parser) and lots of lookahead that's usually wasted (but
hopefully fast). You have to check every sentence (or whatever) for
lo'ai before the main grammar parse, whether you do it before or after
the morph parse. If you want to see how that's implemented, take a
look at SA. Now, SA has a lot more complicated grammar, so lo'ai would
be easier to implement even using the same technique. (And contrary to
Jorge, I'm not too sure it would introduce any weird interactions with
the SA machinery.)

I'm still not getting through.  We are talking about two different things.
 
> It doesn't matter if it has the same parse tree.  It only matters that it
> PARSES IN ANY WAY.  If it does, then the parser will be able to continue.
> If it doesn't, then the parser will die.

I'm more concerned about interactive parsing where parse errors aren't
a huge deal, especially because you get detailed and helpful error
information, much, much better than jbofi'e, to help you find the
problem.

I think perhaps a better (simple) way to handle lo'ai is to treat it
similar to a plain-old lo'u - le'u quote. Still have it behave like a
UI, but only morph parse the words until the le'ai. In fact, I imagine
a number of experimental cmavo that create new selmaho could be
handled cursorily as quotes of this kind. It's not ideal, but it
allows a non-expert user to modify the parser with configuration to
handle text using these cmavo better than before.

Yes, this is what I've been trying to say.  Thank you.  Just handle it like a parenthetical expression.

The more complicated implementation that actually replaces at parse time is another discussion (which I've been trying to avoid in order to keep this simple, but by all means continue if it is interesting to you).

I'm not even sure I'd want my parser to erase and replace stuff.  I consider an erasure or a replacement to be an additional utterance that is often best understood as such.  It would even be interesting to make a parser that could parse through errors and resynchronize later (e.g., when {.i} is encountered), and things like that.

Anyway, I'm in over my head.  I'm not a parser expert.

--
Daniel Brockman
daniel@brockman.se


------=_Part_23553_17015168.1225961548079-- To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.