From lojban+bncCLr6ktCfBBCKqbnpBBoETQjuRw@googlegroups.com Wed Jan 12 17:25:46 2011 Received: from mail-yx0-f189.google.com ([209.85.213.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PdBwc-0005gr-Hh; Wed, 12 Jan 2011 17:25:46 -0800 Received: by yxn35 with SMTP id 35sf878902yxn.16 for ; Wed, 12 Jan 2011 17:25:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:x-beenthere:received-spf:date:from:to:subject :message-id:mail-followup-to:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-disposition :content-transfer-encoding; bh=EcuWskak238O9ry/wTDZ+wzusp3fAzyKuLPvAepcVUs=; b=Cz0XCXr4uvj7Llrca93p6jtTe4Z/kpQyB66szdQ/EUJ+mL67G/bE1ePVVgm9wuTaV3 8URBNRYCwooH2ZLBjnLmumVGZ/zRBUrHzNT7NGP2d0u8j4hDUdVIycZX/LmOqwQRoJDq 1EIln+U4X6sXHtF0MQgk9U7E1H9KA6vcnDZQg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-disposition :content-transfer-encoding; b=NfyIELgNqIYNDWd36Krl1pwA755foGu3DmnQ5v7hKKMlhNriYVw2Uddce2kbIuVYC6 kIaGC83ZzdvXxG2zHnbz2H04L3CuhPj0POHaCoGgyJzNez5sMwhELxaZ8Ov+zoxqvfix 4zgUw58q1vcptnm0x2r9A64MZnAAcewEWsBYM= Received: by 10.100.236.25 with SMTP id j25mr56155anh.39.1294881930172; Wed, 12 Jan 2011 17:25:30 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.100.4.7 with SMTP id 7ls214404and.7.p; Wed, 12 Jan 2011 17:25:29 -0800 (PST) Received: by 10.100.206.15 with SMTP id d15mr273272ang.11.1294881929671; Wed, 12 Jan 2011 17:25:29 -0800 (PST) Received: by 10.100.206.15 with SMTP id d15mr273271ang.11.1294881929627; Wed, 12 Jan 2011 17:25:29 -0800 (PST) Received: from mail-gw0-f41.google.com (mail-gw0-f41.google.com [74.125.83.41]) by gmr-mx.google.com with ESMTP id e31si415939ana.11.2011.01.12.17.25.29; Wed, 12 Jan 2011 17:25:29 -0800 (PST) Received-SPF: neutral (google.com: 74.125.83.41 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=74.125.83.41; Received: by gwj22 with SMTP id 22so524391gwj.0 for ; Wed, 12 Jan 2011 17:25:29 -0800 (PST) Received: by 10.90.65.13 with SMTP id n13mr2469932aga.170.1294881929343; Wed, 12 Jan 2011 17:25:29 -0800 (PST) Received: from sunflowerriver.org (c-68-35-167-179.hsd1.nm.comcast.net [68.35.167.179]) by mx.google.com with ESMTPS id i2sm826819yha.25.2011.01.12.17.25.27 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 12 Jan 2011 17:25:28 -0800 (PST) Date: Wed, 12 Jan 2011 18:25:24 -0700 From: ".alyn.post." To: Lojban List Subject: [lojban] inconsistency between PEG grammar and CLL 17.4 Message-ID: <20110113012524.GC1262@alice.local> Mail-Followup-To: Lojban List Mime-Version: 1.0 X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 74.125.83.41 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=windows-1252 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [This fell out of my researching SA. tl;dr: I've found two bugs in BU handling in the PEG grammar.] CLL 17.4[1] contains an interesting passage: Formally, =93bu=94 may be attached to any single Lojban word. Compound cmavo do not count as words for this purpose. The special cmavo =93ba'e=94, =93za'e=94, =93zei=94, =93zo=94, =93zoi=94, =93la'o=94, =93lo= 'u=94, =93si=94, =93sa=94, =93su=94, and =93fa'o=94 may not have =93bu=94 attached, because they are interpreted before =93bu=94 detection is done; in particular, 4.1) zo bu the word =93bu=94 is needed when discussing =93bu=94 in Lojban. It is also illegal to attach =93bu=94 to itself, but more than one =93bu=94 may be attached to a word; thus =93.abubu=94 is legal, if ugly. (Its meaning is not defined, but it is presumably different from =93.abu=94.) It does not matter if the word is a cmavo, a cmene, or a brivla. All such words suffixed by =93bu=94 are treated grammatically as if they were cmavo belonging to selma'o BY. However, if the word is a cmene it is always necessary to precede and follow it by a pause, because otherwise the cmene may absorb preceding or following words.=20 I do wish the CLL explained why these cmavo are special. It doesn't, so I'm going to pretent the reason is immutable and run some tests: ;; Let's establish a baseline of what camxes does in normal cases. ;; ; gismu -> broda bu text buClauseNoPre |- BRIVLA | gismu: broda |- CMAVO BU: bu ; lujvo -> rodbo'e bu rodbo'e bu text buClauseNoPre |- BRIVLA | lujvo: rodbo'e |- CMAVO BU: bu ; fu'ivla -> fiorso bu fiorso bu text buClauseNoPre |- BRIVLA | fuhivla: fiorso |- CMAVO BU: bu ; cmene -> .alyn. bu text buClauseNoPre |- CMENE | cmene: alyn |- CMAVO BU: bu ; cmavo (this could certainly be exhaustive) -> .abu text buClauseNoPre |- CMAVO | A: a |- CMAVO BU: bu -> lobu text buClauseNoPre |- CMAVO | LE: lo |- CMAVO BU: bu So far, this has all passed through the same production(s) and the PEG grammar agrees with the CLL (and you can see where this is going): bu-clause-no-pre <- pre-zei-bu (bu-tail? zei-tail)* bu-tail post-clause zei-tail <- (ZEI-clause any-word)+ bu-tail <- BU-clause+ pre-zei-bu <- ( !BU-clause !ZEI-clause !SI-clause !SA-clause !SU-clause !FAhO-clause any-word-SA-handling ) si-clause? Let's try the forbidden cmavo: ; forbidden cmavo -> ba'e bu text buClauseNoPre |- CMAVO | BAhE: ba'e |- CMAVO BU: bu -> za'e bu text buClauseNoPre |- CMAVO | BAhE: za'e |- CMAVO BU: bu These are the entirety of BAhE, and the CLL is inconsistent with the PEG grammar. I believe you were just complaining about BAhE today, Robin. :-D It gets better... -> zei bu [ shouldn't and doesn't parse ] -> zo bu text ZOPre |- CMAVO | ZO: zo |- CMAVO BU: bu Note the parse tree differs and is presumably correct. =20 -> zoi bu [ shouldn't and doesn't parse ] -> la'o bu [ shouldn't and doesn't parse ] This is the entirety of ZOI, and the PEG and CLL are mutually consistent. -> lo'u bu [ shouldn't and doesn't parse ] -> si bu [ shouldn't and doesn't parse ] -> sa bu [ shouldn't and doesn't parse ] -> su bu text buClauseNoPre |- CMAVO | SU: su |- CMAVO BU: bu Now wait just a minute here. The rule above *explicitily forbids* SU. How is it that it is matching? -> fa'o bu [ shouldn't and doesn't parse ] -> bu bu [ shouldn't and doesn't parse ] The answer as to why SU matches has to do with this tricky little interaction: SU-clause <- SU-pre SU-post SU-pre <- pre-clause SU `spaces? SU-post <- post-clause ; Handling of what can go after a cmavo post-clause <- `spaces? si-clause? !ZEI-clause !BU-clause indicators* pre-clause <- BAhE-clause? We match !SU-clause, which matches SU just fine, but the post-clause production contains the rule !BU-clause, and there does happen to be a BU, so the match fails, post-clause no-matches, SU-post no-matches, which causes the SU-clause production to no-match, so our check that we don't have SU fails. This could apply to all of pre-zei-bu's '!BRODA-clause' rules. It could be a consequence of the grammar that prevents BU, ZEI, SI, SA, and FAhO from being parsed. Let's check them. BU-clause, ZEI-caluse, SI-clause, SA-clause, and FAhO-clause don't use the post-clause production. They therefor don't have this problem. It appears to be unique to SU. This leaves us with two problems. We need to improve pre-zei-bu to: * not permit BAhE (note that BAhE-post *also* has a !BU-clause, it would in theory suffer from the same problem SU does. * for-real no-match SU by not triggering the !BU-clause rule. =20 Shall I run back to my cave and formulate a patch, or is it so obvious that you can do it? -Alan PS: I've checked the CLL Errata[2] and the Suggestions for the CLL second edition[2] and neither of those documents has an entry for CLL 17.4. I assume the PEG grammar is mistaken here, and should be fixed. 1: http://dag.github.com/cll/17/4/ 2: http://www.lojban.org/tiki/tiki-index.php?page=3DCLL,+aka+Reference+Gram= mar,+Errata 3: http://www.lojban.org/tiki/tiki-index.php?page=3DSuggestions+for+CLL%2C+= second+edition --=20 .i ko djuno fi le do sevzi --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.