From richard@rrbcurnow.freeuk.com Tue Jun 06 13:28:34 2000 Return-Path: Received: (qmail 30220 invoked from network); 6 Jun 2000 20:27:33 -0000 Received: from unknown (10.1.10.27) by m2.onelist.org with QMQP; 6 Jun 2000 20:27:33 -0000 Received: from unknown (HELO scrabble.freeuk.net) (212.126.144.6) by mta2 with SMTP; 6 Jun 2000 20:27:32 -0000 Received: from [212.126.153.12] (helo=rrbcurnow.freeuk.com ident=root) by scrabble.freeuk.net with esmtp (Exim 3.12 #1) id 12zPwe-0004df-00 for lojban@egroups.com; Tue, 06 Jun 2000 21:27:28 +0100 Received: from richard by rrbcurnow.freeuk.com with local (Exim 2.02 #2) id 12zPdG-000025-00 for lojban@egroups.com; Tue, 6 Jun 2000 21:07:26 +0100 Date: Tue, 6 Jun 2000 21:07:26 +0100 To: lojban@egroups.com Subject: Re: [lojban] (Technical) Problem area in v3 grammar Message-ID: <20000606210726.A126@rrbcurnow.freeuk.com> Reply-To: Richard Curnow Mail-Followup-To: lojban@egroups.com References: <00e501bfce8b$5bed0820$2f75bad0@cjnelson> <200006041853.OAA13654@locke.ccil.org> <00e501bfce8b$5bed0820$2f75bad0@cjnelson> <20000605214756.A431@rrbcurnow.freeuk.com> <4.2.2.20000606001958.00abae60@127.0.0.1> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <4.2.2.20000606001958.00abae60@127.0.0.1>; from lojbab@lojban.org on Tue, Jun 06, 2000 at 12:33:51AM -0400 From: Richard Curnow On Tue, Jun 06, 2000 at 12:33:51AM -0400, Bob LeChevalier (lojbab) wrote: > > I figure it is about time to step in and say something, because this > problem has put us on new and uncertain ground and I have passed it to the > LLG Board to decide the baseline policy issue pertaining to this problem. > > Note that, even if it is the "right solution", there is likely to be > considerable debate amongst the Board as to whether we should break the > baseline to make the fix official, indeed especially since the change is so > incidental that it sounds like people are on the point of classifying it > like one of several other unfortunate inconsistencies in the > Book. (Erroneous parses, on the other hand are a significantly undesirable > situation, so nothing is cut and dried). > > It is possible that the change may be left as an unofficial > possible-solution to be tried out until the baseline period is over. Or it > may remain in limbo until/unless we can publish a 2nd edition of the > Book. (There may have to be consideration for the perhaps 20% of our > purchasers who have bought copies through bookstores or otherwise > indirectly, who we could not contact with a correction/errata on the > necessary change to the book baseline, which some might feel an essential > requirement for us if we change the baseline officially.) > > People are welcome to continue to explore ramifications of the problem and > its possible solutions, but the impact of this problem and its potential > solution on the baseline policy, if any, will be dealt with separately from > the technical issue itself. Fortunately the situation does not seem to be so bad. The "la frank. sanli..." example from the Book *DOES* actually make it through a parser based on the BNF grammar definition (albeit in a slightly unexpected way). As John has pointed out, the construct exhibited by this example can also be used as a workaround to the original "pu zi gi gi " wart I found. So at a fundamental level, the Book and the machine grammar *CAN* be considered mutually consistent, *AND* a workaround to my problem exists (i.e. use 'tag termset' where termset="nu'i sumti", instead of 'tag sumti' where the original problem exists) within the scope of the existing grammar and the Book. The 'tag sumti' variant can be considered an optional rule (since it can always be expressed with the 'tag termset' method), which removes the remaining shift/reduce conflict that I found in the grammar. Thus, the good news is that there isn't a fundamental need to change the Book or the grammar. This will be something of a relief, I am sure, considering the issues Bob highlights in the quote above. >From the practical viewpoint of building a language processing tool though, it does seem desirable to add a rule for "term = tag termset" to the grammar. As I said yesterday, the "la frank. sanli..." example parses the zu'a and the termset as two independent terms in a sequence, the zu'a looking like a floating tense with the whole bridi as its scope. The idiom envisaged in the Book is clear - in this case the zu'a has to be considered bound to the following termset as a tense ranging over that termset, not as a floating tense. Based on this, the intent of the Book is that whenever 2 terms occur in sequence with the 1st being a tag and the 2nd a termset, they are considered bound together in this way. Hence my assertion that adding a "term = tag termset" to the grammar doesn't contradict the existing materials (the Book and grammar definition) - it merely codifies this idiom into the grammar definition directly. Anyway, I'm including this extra 'term' rule into v0.33 of jbofi'e when it appears, since it gives practical benefits for analysing texts using this construct. The casualties are texts that intentionally use a floating tense without a terminating ku immediately before a termset. These are inconsistent with the "la frank. sanli..." example and supporting text from the Book anyway. -- ---------------------------------------------------------------------- Richard P. Curnow rpc@myself.com Weston-super-Mare United Kingdom http://www.rrbcurnow.freeuk.com/