From nobody@digitalkingdom.org Wed Nov 05 16:30:29 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Wed, 05 Nov 2008 16:30:30 -0800 (PST) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1KxslY-00068u-Vx for lojban-list-real@lojban.org; Wed, 05 Nov 2008 16:30:29 -0800 Received: from yx-out-1718.google.com ([74.125.44.152]) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1KxslU-00068P-V2 for lojban-list@lojban.org; Wed, 05 Nov 2008 16:30:28 -0800 Received: by yx-out-1718.google.com with SMTP id 4so165778yxp.46 for ; Wed, 05 Nov 2008 16:30:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=zXnPO0iiavrZfiyfAR0L/EDl5mLUt6vFw774Zdw1bvk=; b=w6z7Ac5dyRLIyizlU8tey3ezntGgFIDtA6+o4/FqlM9PyvNEcAknvC4aGFg3tZWYc5 X4oOuyT7WLqZ/Y/bXwHZG6AnOsWCnCt45vo3lnQbPf9xqxC55XDyv+qW1vsUXEpltr3p /luM0zMynZYiwGl8eF5JHid4930kbFlBstlyM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=DF9tvBH9FXpD23gKtfcmqGbPD3MRmyM45yx8NCuzRseskqHzIo60Iz+Oun7jkQxr2r 9EuLiWkljDFei671bOX6tzQvgMqQEhmNOYszzQ5Yobl9Lob+MRl8fSi1K86gj85FTl6g qn710Lr/tvtHiwvVYzu4h1dh4v6UPn46/CXH0= Received: by 10.150.228.2 with SMTP id a2mr259678ybh.166.1225931423161; Wed, 05 Nov 2008 16:30:23 -0800 (PST) Received: by 10.150.199.20 with HTTP; Wed, 5 Nov 2008 16:30:23 -0800 (PST) Message-ID: <737b61f30811051630t6adad5e0x54456e789d70c5b@mail.gmail.com> Date: Wed, 5 Nov 2008 18:30:23 -0600 From: "Chris Capel" To: lojban-list@lojban.org Subject: [lojban] Re: experimental cmavo in lojgloss. In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <737b61f30811022128n9e8692evefaa820062d2a652@mail.gmail.com> <925d17560811031040t402eb7a9k31e0d61bf7ca3cea@mail.gmail.com> <925d17560811040350g2a04db8ewd2f34a8a43d96767@mail.gmail.com> <737b61f30811041523o3574936fp27dea91b6a058c26@mail.gmail.com> <737b61f30811050534i514b3fddv197b2a07a47655f9@mail.gmail.com> X-Spam-Score: -0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14932 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: pdf23ds@gmail.com Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list On Wed, Nov 5, 2008 at 18:07, Daniel Brockman wrote: >> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to >> > just treat it as a self-contained construct that requires >> > morphologically >> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically >> > correct Lojban before it (just like everything else). >> >> How far before it? Up to the beginning of the sentence? The statement? > > The {le'ai} construct doesn't care about ANYTHING else. However your parser > works, that's how it works before {le'ai}. I don't understand. You're saying that if there's a lo'ai then everything before it in the text should get only a syntactical parse, not a grammatical parse? If not, there has to be some cutoff. >> > Of course it would require extraordinary methods to get things like >> > {kwama >> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n >> > su >> > coi} --- to parse. It's not practical and not cost-efficient. The >> > {kjama} >> > example falls in this category because {kj} is morphologically invalid. >> >> Hmm. I think you overestimate the difference in effort between the two >> implementations. They both require the same tricks, just at a slightly >> different level in the grammar. > > What are you talking about? One implementation is self-contained; the other > requires lots of weird backtracking and re-parsing and weird, weird stuff. No, both require backtracking (but not reparsing, since this is a packrat parser) and lots of lookahead that's usually wasted (but hopefully fast). You have to check every sentence (or whatever) for lo'ai before the main grammar parse, whether you do it before or after the morph parse. If you want to see how that's implemented, take a look at SA. Now, SA has a lot more complicated grammar, so lo'ai would be easier to implement even using the same technique. (And contrary to Jorge, I'm not too sure it would introduce any weird interactions with the SA machinery.) > It doesn't matter if it has the same parse tree. It only matters that it > PARSES IN ANY WAY. If it does, then the parser will be able to continue. > If it doesn't, then the parser will die. I'm more concerned about interactive parsing where parse errors aren't a huge deal, especially because you get detailed and helpful error information, much, much better than jbofi'e, to help you find the problem. I think perhaps a better (simple) way to handle lo'ai is to treat it similar to a plain-old lo'u - le'u quote. Still have it behave like a UI, but only morph parse the words until the le'ai. In fact, I imagine a number of experimental cmavo that create new selmaho could be handled cursorily as quotes of this kind. It's not ideal, but it allows a non-expert user to modify the parser with configuration to handle text using these cmavo better than before. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.