From lojban-out@lojban.org Tue Jul 11 23:21:18 2006
Return-Path: <lojban-out@lojban.org>
X-Sender: lojban-out@lojban.org
X-Apparently-To: lojban@yahoogroups.com
Received: (qmail 50200 invoked from network); 12 Jul 2006 06:17:42 -0000
Received: from unknown (66.218.66.166)
  by m30.grp.scd.yahoo.com with QMQP; 12 Jul 2006 06:17:42 -0000
Received: from unknown (HELO chain.digitalkingdom.org) (64.81.49.134)
  by mta5.grp.scd.yahoo.com with SMTP; 12 Jul 2006 06:17:42 -0000
Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.62)
	(envelope-from <lojban-out@lojban.org>)
	id 1G0Y2M-0005Qn-Mg
	for lojban@yahoogroups.com; Tue, 11 Jul 2006 23:17:30 -0700
Received: from chain.digitalkingdom.org ([64.81.49.134])
	by chain.digitalkingdom.org with esmtp (Exim 4.62)
	(envelope-from <lojban-list-bounce@lojban.org>)
	id 1G0Y1V-0005Bw-Gu; Tue, 11 Jul 2006 23:16:38 -0700
Received: with ECARTIS (v1.0.0; list lojban-list); Tue, 11 Jul 2006 23:16:29 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62)	(envelope-from <nobody@digitalkingdom.org>)	id 1G0Y13-00055s-RI	for lojban-list-real@lojban.org; Tue, 11 Jul 2006 23:16:09 -0700
Received: from ug-out-1314.google.com ([66.249.92.169])	by chain.digitalkingdom.org with esmtp (Exim 4.62)	(envelope-from <jonored@gmail.com>)	id 1G0Y11-00055l-UB	for lojban-list@lojban.org; Tue, 11 Jul 2006 23:16:09 -0700
Received: by ug-out-1314.google.com with SMTP id s2so173333uge        for <lojban-list@lojban.org>; Tue, 11 Jul 2006 23:16:06 -0700 (PDT)
Received: by 10.66.219.11 with SMTP id r11mr275457ugg;        Tue, 11 Jul 2006 23:16:06 -0700 (PDT)
Received: by 10.67.30.12 with HTTP; Tue, 11 Jul 2006 23:16:06 -0700 (PDT)
Message-ID: <e202d93c0607112316i783ebc55i9ffdcef01c52b367@mail.gmail.com>
Date: Wed, 12 Jul 2006 02:16:06 -0400
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: -2.3 (--)
X-archive-position: 12144
X-ecartis-version: Ecartis v1.0.0
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: jonored@gmail.com
X-list: lojban-list
X-Spam-Score: -2.3 (--)
To: lojban@yahoogroups.com
X-Originating-IP: 64.81.49.134
X-eGroups-Msg-Info: 1:0:0:0
X-eGroups-From: "Jonathan Gibbons" <jonored@gmail.com>
From: "Jonathan Gibbons" <lojban-out@lojban.org>
Reply-To: jonored@gmail.com
Subject: [lojban] Re: Is Lojban a CFG? (was Re: [lojban-beginners] Re: Enumerating in Lojban)
X-Yahoo-Group-Post: member; u=116389790; y=f2so2LrCGk3zabmxZM8es7x8EKkqDi4EDsHshaTyvvBR8Wgz5w
X-Yahoo-Profile: lojban_out
X-Yahoo-Message-Num: 26571

Forwarding the message I sent to the beginners list, because I took
far too long to catch on to the moved-ness of this thread. Also
continuing.

> le nu le broda brode brodi

Okay, now that one would be what I was looking for, I think. Okay.
Tossing that out as ungrammatical would appear to make Lojban into a
non-context-free language (being mildly nitpicky; the question of "Is
Lojban a CFG?" is trivially no; very few languages (sets of strings)
are context-free grammars (sets of production rules). "Is lojban a
CFL?" is, however, what it seems intended as. And, much to my sadness,
it seems to be a no; why it has been defined not to be I don't quite
understand, but that does look like the counterexample; I still hold
that it's a very odd way of forcing a language into unambiguity,
making it non-context-free.)

Perhaps name the language I seem to be going to construct
"narvablojban"? a language that is a proper superset of Lojban that's
/actually/ context-free as well as semantically unambiguous, rather
than just close.

-A very sad Jonathan

---------- Forwarded message ----------
From: Jonathan Gibbons <jonored@gmail.com>
Date: Jul 12, 2006 1:05 AM
Subject: Re: [lojban-beginners] Re: Enumerating in Lojban
To: lojban-beginners@lojban.org


> non-sentence: {ko catlu le nu le smacu bajra}

That doesn't seem like it ought to be a non-sentence to me, save for
by the official parser being quirky.

---stream of logic probably poorly stated in english, describing a
derivation follows; reading the paragraphs below first may be
preferable.---

So, for tokens that's "KOhA BRIVLA LE NU LE BRIVLA BRIVLA"; starting
at text we can only take the option of text-1 with no optional
elements, thence to paragraphs,  then to paragraph, cannot be a
fragment of type terms because of the token of type BRIVLA in the
second place, in fact cannot be a fragment at all so it must be a
statement, no prenex so it's just a plain statement-1, no "I" so it is
plain through statement-2 and statement-3 all the way to sentence,
cannot be a plain bridi-tail so it must have terms followed by
bridi-tail, I'll skip writing the descent through from terms to KOhA
as sufficiently self-evident, please call me on it if you disagree,
gihek cannot match starting at the second place so we are forced from
bridi-tail to bridi-tail-3, gek-sentence doesn't match so it must
derive to selbri followed by tail-terms, there is no tag, NA, or CO,
which forces us through to just selbri-3, assuming momentarily that a
selbri-4 cannot begin with "LE" we continue with a single selbri-4,
joik-jek and joik can't match so we have just a selbri-5 which is just
a selbri-6, no BO NAhE or guhek means it's just a tanru-unit, no CEI
so tanru-unit-1, no linkargs so just tanru-unit-2, where the only
possibilities are the first option, which is raw BRIVLA, or any_word,
which cannot match because the next token is not ZEI (That's messy; is
that really what it's trying to say? any word followed by one of more
sequences of "ZEI" and any word can be a tanru-unit-2?). That leaves
the derivation of "LE NU LE BRIVLA BRIVLA" from tail-terms; there is
no VAU, so that must go to plain terms; it seems clear that it cannot
be two or more terms-1 concatenated (feel free to call me to show that
if it is not apparent); there is no PEhE or CEhE so it must be a plain
term; no tag, FA, termset, or NA, so it's a plain sumti; no VUhO, ek,
joik, joik-ek, or gek, so it must derive down through that train to a
single sumti-5, quantifier cannot match, nor can relative-clauses, so
it's a plain sumti-6. So, a derivation from sumti-6 to "LE NU LE
BRIVLA BRIVLA".
the "(LA | LE) # sumti-tail /KU#/" portion of the alt is the only one
that can match the LE token, there are no matches for free, and no KU,
so that leaves us with "NU LE BRIVLA BRIVLA" to derive from
sumti-tail. We cannot match sumti-6 against the beginning of this and
there are no relative clauses or quantifiers, so it must be
sumti-tail-1 and thence to selbri; no tag, NA, or CO means that it is
a single selbri-3, it cannot be a selbri-4 followed by another
selbri-4 because selbri-4 will not match "NU LE BRIVLA"; so it must be
a single selbri-4. No joik-jek, joik, jek, BO, or NAhE sees us clear
to tanru-unit; no CEI or linkargs gets to tanru-unit-2. There, the
only rule that can match is "NU [NAI] # [joik-jek NU [NAI] #] ...
subsentence /KEI#/"; there is no NAI, free, joik-jek, or KEI, so that
takes up the NU and requires a derivation of "LE BRIVLA BRIVLA" from
subsentence. There is no prenex, so it is just a sentence. Because
bridi-tail must have either a selbri or a gek-sentence, and we don't
have a gek-sentence, and because the LE rule in sumti-6 will
eventually require a selbri as well, this must be split to {terms
bridi-tail} in the fashion of "(LE BRIVLA) (BRIVLA)" (I daresay this
is where an LALR(1)-based parser fails on this input; there are
algorithms that are proved handle any CFG, and won't be tripped up by
this issue). From there, it seems straightforward enough that "LE
BRIVLA" is derivable from terms in exactly one way, and "BRIVLA" is
derivable from sumti-tail in exactly one way; I'd rather not go
through it because I'm a bit tired of it and if you've been checking
this, I'm sure you are as well.

---end stream describing derivation---

That certainly seems to describe a valid derivation of "ko catlu le nu
le smacu bajra" from the grammar given in bnf.300, and also shows that
it is the only possible derivation of that sequence from bnf.300,
which is to say, that no grammatical ambiguity exists. As such, the
elidable terminators should not be required, and "ko catlu le nu le
smacu bajra" should have the same meaning as "ko catlu le nu le smacu
ku bajra". Of course, I fully acknowledge that this is not what the
official parser does, but as we're both advocating a more formal
replacement that is not, as it were, "bug-for-bug compatible", that
doesn't seem to be quite relevant; and, while being close enough to
what I was wanting to require a check, that particular pair of strings
does not quite provide the counterexample I was wanting. I really do
want to know if one actually exists, as it changes what I'm wanting to
do from building a grammar for lojban that avoids hacks by using a
more general parser than LALR to defining a new language that is
closely related to lojban but is context-free and still non-LALR.
Although with the yacc parser as the official grammar, I suppose
that's what I'd be doing anyways. But it feels different.

-Jonathan


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.