From lojban-out@lojban.org Tue Jul 11 23:21:18 2006 Return-Path: X-Sender: lojban-out@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (qmail 50200 invoked from network); 12 Jul 2006 06:17:42 -0000 Received: from unknown (66.218.66.166) by m30.grp.scd.yahoo.com with QMQP; 12 Jul 2006 06:17:42 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (64.81.49.134) by mta5.grp.scd.yahoo.com with SMTP; 12 Jul 2006 06:17:42 -0000 Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1G0Y2M-0005Qn-Mg for lojban@yahoogroups.com; Tue, 11 Jul 2006 23:17:30 -0700 Received: from chain.digitalkingdom.org ([64.81.49.134]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1G0Y1V-0005Bw-Gu; Tue, 11 Jul 2006 23:16:38 -0700 Received: with ECARTIS (v1.0.0; list lojban-list); Tue, 11 Jul 2006 23:16:29 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1G0Y13-00055s-RI for lojban-list-real@lojban.org; Tue, 11 Jul 2006 23:16:09 -0700 Received: from ug-out-1314.google.com ([66.249.92.169]) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1G0Y11-00055l-UB for lojban-list@lojban.org; Tue, 11 Jul 2006 23:16:09 -0700 Received: by ug-out-1314.google.com with SMTP id s2so173333uge for ; Tue, 11 Jul 2006 23:16:06 -0700 (PDT) Received: by 10.66.219.11 with SMTP id r11mr275457ugg; Tue, 11 Jul 2006 23:16:06 -0700 (PDT) Received: by 10.67.30.12 with HTTP; Tue, 11 Jul 2006 23:16:06 -0700 (PDT) Message-ID: Date: Wed, 12 Jul 2006 02:16:06 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Spam-Score: -2.3 (--) X-archive-position: 12144 X-ecartis-version: Ecartis v1.0.0 Errors-to: lojban-list-bounce@lojban.org X-original-sender: jonored@gmail.com X-list: lojban-list X-Spam-Score: -2.3 (--) To: lojban@yahoogroups.com X-Originating-IP: 64.81.49.134 X-eGroups-Msg-Info: 1:0:0:0 X-eGroups-From: "Jonathan Gibbons" From: "Jonathan Gibbons" Reply-To: jonored@gmail.com Subject: [lojban] Re: Is Lojban a CFG? (was Re: [lojban-beginners] Re: Enumerating in Lojban) X-Yahoo-Group-Post: member; u=116389790; y=f2so2LrCGk3zabmxZM8es7x8EKkqDi4EDsHshaTyvvBR8Wgz5w X-Yahoo-Profile: lojban_out X-Yahoo-Message-Num: 26571 Forwarding the message I sent to the beginners list, because I took far too long to catch on to the moved-ness of this thread. Also continuing. > le nu le broda brode brodi Okay, now that one would be what I was looking for, I think. Okay. Tossing that out as ungrammatical would appear to make Lojban into a non-context-free language (being mildly nitpicky; the question of "Is Lojban a CFG?" is trivially no; very few languages (sets of strings) are context-free grammars (sets of production rules). "Is lojban a CFL?" is, however, what it seems intended as. And, much to my sadness, it seems to be a no; why it has been defined not to be I don't quite understand, but that does look like the counterexample; I still hold that it's a very odd way of forcing a language into unambiguity, making it non-context-free.) Perhaps name the language I seem to be going to construct "narvablojban"? a language that is a proper superset of Lojban that's /actually/ context-free as well as semantically unambiguous, rather than just close. -A very sad Jonathan ---------- Forwarded message ---------- From: Jonathan Gibbons Date: Jul 12, 2006 1:05 AM Subject: Re: [lojban-beginners] Re: Enumerating in Lojban To: lojban-beginners@lojban.org > non-sentence: {ko catlu le nu le smacu bajra} That doesn't seem like it ought to be a non-sentence to me, save for by the official parser being quirky. ---stream of logic probably poorly stated in english, describing a derivation follows; reading the paragraphs below first may be preferable.--- So, for tokens that's "KOhA BRIVLA LE NU LE BRIVLA BRIVLA"; starting at text we can only take the option of text-1 with no optional elements, thence to paragraphs, then to paragraph, cannot be a fragment of type terms because of the token of type BRIVLA in the second place, in fact cannot be a fragment at all so it must be a statement, no prenex so it's just a plain statement-1, no "I" so it is plain through statement-2 and statement-3 all the way to sentence, cannot be a plain bridi-tail so it must have terms followed by bridi-tail, I'll skip writing the descent through from terms to KOhA as sufficiently self-evident, please call me on it if you disagree, gihek cannot match starting at the second place so we are forced from bridi-tail to bridi-tail-3, gek-sentence doesn't match so it must derive to selbri followed by tail-terms, there is no tag, NA, or CO, which forces us through to just selbri-3, assuming momentarily that a selbri-4 cannot begin with "LE" we continue with a single selbri-4, joik-jek and joik can't match so we have just a selbri-5 which is just a selbri-6, no BO NAhE or guhek means it's just a tanru-unit, no CEI so tanru-unit-1, no linkargs so just tanru-unit-2, where the only possibilities are the first option, which is raw BRIVLA, or any_word, which cannot match because the next token is not ZEI (That's messy; is that really what it's trying to say? any word followed by one of more sequences of "ZEI" and any word can be a tanru-unit-2?). That leaves the derivation of "LE NU LE BRIVLA BRIVLA" from tail-terms; there is no VAU, so that must go to plain terms; it seems clear that it cannot be two or more terms-1 concatenated (feel free to call me to show that if it is not apparent); there is no PEhE or CEhE so it must be a plain term; no tag, FA, termset, or NA, so it's a plain sumti; no VUhO, ek, joik, joik-ek, or gek, so it must derive down through that train to a single sumti-5, quantifier cannot match, nor can relative-clauses, so it's a plain sumti-6. So, a derivation from sumti-6 to "LE NU LE BRIVLA BRIVLA". the "(LA | LE) # sumti-tail /KU#/" portion of the alt is the only one that can match the LE token, there are no matches for free, and no KU, so that leaves us with "NU LE BRIVLA BRIVLA" to derive from sumti-tail. We cannot match sumti-6 against the beginning of this and there are no relative clauses or quantifiers, so it must be sumti-tail-1 and thence to selbri; no tag, NA, or CO means that it is a single selbri-3, it cannot be a selbri-4 followed by another selbri-4 because selbri-4 will not match "NU LE BRIVLA"; so it must be a single selbri-4. No joik-jek, joik, jek, BO, or NAhE sees us clear to tanru-unit; no CEI or linkargs gets to tanru-unit-2. There, the only rule that can match is "NU [NAI] # [joik-jek NU [NAI] #] ... subsentence /KEI#/"; there is no NAI, free, joik-jek, or KEI, so that takes up the NU and requires a derivation of "LE BRIVLA BRIVLA" from subsentence. There is no prenex, so it is just a sentence. Because bridi-tail must have either a selbri or a gek-sentence, and we don't have a gek-sentence, and because the LE rule in sumti-6 will eventually require a selbri as well, this must be split to {terms bridi-tail} in the fashion of "(LE BRIVLA) (BRIVLA)" (I daresay this is where an LALR(1)-based parser fails on this input; there are algorithms that are proved handle any CFG, and won't be tripped up by this issue). From there, it seems straightforward enough that "LE BRIVLA" is derivable from terms in exactly one way, and "BRIVLA" is derivable from sumti-tail in exactly one way; I'd rather not go through it because I'm a bit tired of it and if you've been checking this, I'm sure you are as well. ---end stream describing derivation--- That certainly seems to describe a valid derivation of "ko catlu le nu le smacu bajra" from the grammar given in bnf.300, and also shows that it is the only possible derivation of that sequence from bnf.300, which is to say, that no grammatical ambiguity exists. As such, the elidable terminators should not be required, and "ko catlu le nu le smacu bajra" should have the same meaning as "ko catlu le nu le smacu ku bajra". Of course, I fully acknowledge that this is not what the official parser does, but as we're both advocating a more formal replacement that is not, as it were, "bug-for-bug compatible", that doesn't seem to be quite relevant; and, while being close enough to what I was wanting to require a check, that particular pair of strings does not quite provide the counterexample I was wanting. I really do want to know if one actually exists, as it changes what I'm wanting to do from building a grammar for lojban that avoids hacks by using a more general parser than LALR to defining a new language that is closely related to lojban but is context-free and still non-LALR. Although with the yacc parser as the official grammar, I suppose that's what I'd be doing anyways. But it feels different. -Jonathan To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.