From richard@rrbcurnow.freeuk.com Tue Jun 06 13:28:34 2000
Return-Path: <richard@rrbcurnow.freeuk.com>
Received: (qmail 30220 invoked from network); 6 Jun 2000 20:27:33 -0000
Received: from unknown (10.1.10.27) by m2.onelist.org with QMQP; 6 Jun 2000 20:27:33 -0000
Received: from unknown (HELO scrabble.freeuk.net) (212.126.144.6) by mta2 with SMTP; 6 Jun 2000 20:27:32 -0000
Received: from [212.126.153.12] (helo=rrbcurnow.freeuk.com ident=root) by scrabble.freeuk.net with esmtp (Exim 3.12 #1) id 12zPwe-0004df-00 for lojban@egroups.com; Tue, 06 Jun 2000 21:27:28 +0100
Received: from richard by rrbcurnow.freeuk.com with local (Exim 2.02 #2) id 12zPdG-000025-00 for lojban@egroups.com; Tue, 6 Jun 2000 21:07:26 +0100
Date: Tue, 6 Jun 2000 21:07:26 +0100
To: lojban@egroups.com
Subject: Re: [lojban] (Technical) Problem area in v3 grammar
Message-ID: <20000606210726.A126@rrbcurnow.freeuk.com>
Reply-To: Richard Curnow <rpc@myself.com>
Mail-Followup-To: lojban@egroups.com
References: <00e501bfce8b$5bed0820$2f75bad0@cjnelson> <200006041853.OAA13654@locke.ccil.org> <00e501bfce8b$5bed0820$2f75bad0@cjnelson> <20000605214756.A431@rrbcurnow.freeuk.com> <4.2.2.20000606001958.00abae60@127.0.0.1>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <4.2.2.20000606001958.00abae60@127.0.0.1>; from lojbab@lojban.org on Tue, Jun 06, 2000 at 12:33:51AM -0400
From: Richard Curnow <richard@rrbcurnow.freeuk.com>

On Tue, Jun 06, 2000 at 12:33:51AM -0400, Bob LeChevalier (lojbab) wrote:
> 
> I figure it is about time to step in and say something, because this 
> problem has put us on new and uncertain ground and I have passed it to the 
> LLG Board to decide the baseline policy issue pertaining to this problem.
> 
> Note that, even if it is the "right solution", there is likely to be 
> considerable debate amongst the Board as to whether we should break the 
> baseline to make the fix official, indeed especially since the change is so 
> incidental that it sounds like people are on the point of classifying it 
> like one of several other unfortunate inconsistencies in the 
> Book. (Erroneous parses, on the other hand are a significantly undesirable 
> situation, so nothing is cut and dried).
> 
> It is possible that the change may be left as an unofficial 
> possible-solution to be tried out until the baseline period is over. Or it 
> may remain in limbo until/unless we can publish a 2nd edition of the 
> Book. (There may have to be consideration for the perhaps 20% of our 
> purchasers who have bought copies through bookstores or otherwise 
> indirectly, who we could not contact with a correction/errata on the 
> necessary change to the book baseline, which some might feel an essential 
> requirement for us if we change the baseline officially.)
> 
> People are welcome to continue to explore ramifications of the problem and 
> its possible solutions, but the impact of this problem and its potential 
> solution on the baseline policy, if any, will be dealt with separately from 
> the technical issue itself.

Fortunately the situation does not seem to be so bad. The "la frank.
sanli..." example from the Book *DOES* actually make it through a parser
based on the BNF grammar definition (albeit in a slightly unexpected
way). As John has pointed out, the construct exhibited by this example
can also be used as a workaround to the original "pu zi gi <sumti> gi
<sumti>" wart I found. So at a fundamental level, the Book and the
machine grammar *CAN* be considered mutually consistent, *AND* a
workaround to my problem exists (i.e. use 'tag termset' where
termset="nu'i sumti", instead of 'tag sumti' where the original problem
exists) within the scope of the existing grammar and the Book. The 'tag
sumti' variant can be considered an optional rule (since it can always
be expressed with the 'tag termset' method), which removes the remaining
shift/reduce conflict that I found in the grammar.

Thus, the good news is that there isn't a fundamental need to change the
Book or the grammar. This will be something of a relief, I am sure,
considering the issues Bob highlights in the quote above.

>From the practical viewpoint of building a language processing tool
though, it does seem desirable to add a rule for "term = tag termset" to
the grammar. As I said yesterday, the "la frank. sanli..." example
parses the zu'a and the termset as two independent terms in a sequence,
the zu'a looking like a floating tense with the whole bridi as its
scope. The idiom envisaged in the Book is clear - in this case the zu'a
has to be considered bound to the following termset as a tense ranging
over that termset, not as a floating tense.

Based on this, the intent of the Book is that whenever 2 terms occur in
sequence with the 1st being a tag and the 2nd a termset, they are
considered bound together in this way. Hence my assertion that adding a
"term = tag termset" to the grammar doesn't contradict the existing
materials (the Book and grammar definition) - it merely codifies this
idiom into the grammar definition directly.

Anyway, I'm including this extra 'term' rule into v0.33 of jbofi'e when
it appears, since it gives practical benefits for analysing texts using
this construct. The casualties are texts that intentionally use a
floating tense without a terminating ku immediately before a termset.
These are inconsistent with the "la frank. sanli..." example and
supporting text from the Book anyway.

-- 
----------------------------------------------------------------------
Richard P. Curnow rpc@myself.com
Weston-super-Mare
United Kingdom http://www.rrbcurnow.freeuk.com/