From nobody@digitalkingdom.org Tue Jul 11 22:05:58 2006 Received: with ECARTIS (v1.0.0; list lojban-beginners); Tue, 11 Jul 2006 22:05:59 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1G0Wv8-0001no-Ap for lojban-beginners-real@lojban.org; Tue, 11 Jul 2006 22:05:58 -0700 Received: from ug-out-1314.google.com ([66.249.92.170]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1G0Wv6-0001ng-VL for lojban-beginners@lojban.org; Tue, 11 Jul 2006 22:05:58 -0700 Received: by ug-out-1314.google.com with SMTP id s2so158548uge for ; Tue, 11 Jul 2006 22:05:55 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YMLLI/TSZGT0cJWdw4s8fj6ufOhjTP7/fr9aa5sHdTUXeo+NLlmFopQh2/zdqA/jUA5cryBnzVaRq1e3swx074lCg9PWvqL8n2zGL4wSO4LniITkeZxi0rq/d36YTc7GHWJATGxUMK+nAT5zRKemSXUjLMXlVUGf29sAnn2uK9Q= Received: by 10.67.26.7 with SMTP id d7mr226472ugj; Tue, 11 Jul 2006 22:05:55 -0700 (PDT) Received: by 10.67.30.12 with HTTP; Tue, 11 Jul 2006 22:05:55 -0700 (PDT) Message-ID: Date: Wed, 12 Jul 2006 01:05:55 -0400 From: "Jonathan Gibbons" To: lojban-beginners@lojban.org Subject: [lojban-beginners] Re: Enumerating in Lojban In-Reply-To: <925d17560607111523t81ad99eyea59fa267c867b55@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <1684503175.20060710193640@mail.ru> <20060710164123.GS3440@chain.digitalkingdom.org> <20060710173540.GV3440@chain.digitalkingdom.org> <20060711052439.GC10845@chain.digitalkingdom.org> <20060711200739.GK10845@chain.digitalkingdom.org> <925d17560607111523t81ad99eyea59fa267c867b55@mail.gmail.com> X-Spam-Score: -2.3 (--) X-archive-position: 3418 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-beginners-bounce@lojban.org Errors-to: lojban-beginners-bounce@lojban.org X-original-sender: jonored@gmail.com Precedence: bulk Reply-to: lojban-beginners@lojban.org X-list: lojban-beginners > non-sentence: {ko catlu le nu le smacu bajra} That doesn't seem like it ought to be a non-sentence to me, save for by the official parser being quirky. ---stream of logic probably poorly stated in english, describing a derivation follows; reading the paragraphs below first may be preferable.--- So, for tokens that's "KOhA BRIVLA LE NU LE BRIVLA BRIVLA"; starting at text we can only take the option of text-1 with no optional elements, thence to paragraphs, then to paragraph, cannot be a fragment of type terms because of the token of type BRIVLA in the second place, in fact cannot be a fragment at all so it must be a statement, no prenex so it's just a plain statement-1, no "I" so it is plain through statement-2 and statement-3 all the way to sentence, cannot be a plain bridi-tail so it must have terms followed by bridi-tail, I'll skip writing the descent through from terms to KOhA as sufficiently self-evident, please call me on it if you disagree, gihek cannot match starting at the second place so we are forced from bridi-tail to bridi-tail-3, gek-sentence doesn't match so it must derive to selbri followed by tail-terms, there is no tag, NA, or CO, which forces us through to just selbri-3, assuming momentarily that a selbri-4 cannot begin with "LE" we continue with a single selbri-4, joik-jek and joik can't match so we have just a selbri-5 which is just a selbri-6, no BO NAhE or guhek means it's just a tanru-unit, no CEI so tanru-unit-1, no linkargs so just tanru-unit-2, where the only possibilities are the first option, which is raw BRIVLA, or any_word, which cannot match because the next token is not ZEI (That's messy; is that really what it's trying to say? any word followed by one of more sequences of "ZEI" and any word can be a tanru-unit-2?). That leaves the derivation of "LE NU LE BRIVLA BRIVLA" from tail-terms; there is no VAU, so that must go to plain terms; it seems clear that it cannot be two or more terms-1 concatenated (feel free to call me to show that if it is not apparent); there is no PEhE or CEhE so it must be a plain term; no tag, FA, termset, or NA, so it's a plain sumti; no VUhO, ek, joik, joik-ek, or gek, so it must derive down through that train to a single sumti-5, quantifier cannot match, nor can relative-clauses, so it's a plain sumti-6. So, a derivation from sumti-6 to "LE NU LE BRIVLA BRIVLA". the "(LA | LE) # sumti-tail /KU#/" portion of the alt is the only one that can match the LE token, there are no matches for free, and no KU, so that leaves us with "NU LE BRIVLA BRIVLA" to derive from sumti-tail. We cannot match sumti-6 against the beginning of this and there are no relative clauses or quantifiers, so it must be sumti-tail-1 and thence to selbri; no tag, NA, or CO means that it is a single selbri-3, it cannot be a selbri-4 followed by another selbri-4 because selbri-4 will not match "NU LE BRIVLA"; so it must be a single selbri-4. No joik-jek, joik, jek, BO, or NAhE sees us clear to tanru-unit; no CEI or linkargs gets to tanru-unit-2. There, the only rule that can match is "NU [NAI] # [joik-jek NU [NAI] #] ... subsentence /KEI#/"; there is no NAI, free, joik-jek, or KEI, so that takes up the NU and requires a derivation of "LE BRIVLA BRIVLA" from subsentence. There is no prenex, so it is just a sentence. Because bridi-tail must have either a selbri or a gek-sentence, and we don't have a gek-sentence, and because the LE rule in sumti-6 will eventually require a selbri as well, this must be split to {terms bridi-tail} in the fashion of "(LE BRIVLA) (BRIVLA)" (I daresay this is where an LALR(1)-based parser fails on this input; there are algorithms that are proved handle any CFG, and won't be tripped up by this issue). From there, it seems straightforward enough that "LE BRIVLA" is derivable from terms in exactly one way, and "BRIVLA" is derivable from sumti-tail in exactly one way; I'd rather not go through it because I'm a bit tired of it and if you've been checking this, I'm sure you are as well. ---end stream describing derivation--- That certainly seems to describe a valid derivation of "ko catlu le nu le smacu bajra" from the grammar given in bnf.300, and also shows that it is the only possible derivation of that sequence from bnf.300, which is to say, that no grammatical ambiguity exists. As such, the elidable terminators should not be required, and "ko catlu le nu le smacu bajra" should have the same meaning as "ko catlu le nu le smacu ku bajra". Of course, I fully acknowledge that this is not what the official parser does, but as we're both advocating a more formal replacement that is not, as it were, "bug-for-bug compatible", that doesn't seem to be quite relevant; and, while being close enough to what I was wanting to require a check, that particular pair of strings does not quite provide the counterexample I was wanting. I really do want to know if one actually exists, as it changes what I'm wanting to do from building a grammar for lojban that avoids hacks by using a more general parser than LALR to defining a new language that is closely related to lojban but is context-free and still non-LALR. Although with the yacc parser as the official grammar, I suppose that's what I'd be doing anyways. But it feels different. -Jonathan