[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban-beginners] Re: Enumerating in Lojban
non-sentence: {ko catlu le nu le smacu bajra}
That doesn't seem like it ought to be a non-sentence to me, save for
by the official parser being quirky.
---stream of logic probably poorly stated in english, describing a
derivation follows; reading the paragraphs below first may be
preferable.---
So, for tokens that's "KOhA BRIVLA LE NU LE BRIVLA BRIVLA"; starting
at text we can only take the option of text-1 with no optional
elements, thence to paragraphs, then to paragraph, cannot be a
fragment of type terms because of the token of type BRIVLA in the
second place, in fact cannot be a fragment at all so it must be a
statement, no prenex so it's just a plain statement-1, no "I" so it is
plain through statement-2 and statement-3 all the way to sentence,
cannot be a plain bridi-tail so it must have terms followed by
bridi-tail, I'll skip writing the descent through from terms to KOhA
as sufficiently self-evident, please call me on it if you disagree,
gihek cannot match starting at the second place so we are forced from
bridi-tail to bridi-tail-3, gek-sentence doesn't match so it must
derive to selbri followed by tail-terms, there is no tag, NA, or CO,
which forces us through to just selbri-3, assuming momentarily that a
selbri-4 cannot begin with "LE" we continue with a single selbri-4,
joik-jek and joik can't match so we have just a selbri-5 which is just
a selbri-6, no BO NAhE or guhek means it's just a tanru-unit, no CEI
so tanru-unit-1, no linkargs so just tanru-unit-2, where the only
possibilities are the first option, which is raw BRIVLA, or any_word,
which cannot match because the next token is not ZEI (That's messy; is
that really what it's trying to say? any word followed by one of more
sequences of "ZEI" and any word can be a tanru-unit-2?). That leaves
the derivation of "LE NU LE BRIVLA BRIVLA" from tail-terms; there is
no VAU, so that must go to plain terms; it seems clear that it cannot
be two or more terms-1 concatenated (feel free to call me to show that
if it is not apparent); there is no PEhE or CEhE so it must be a plain
term; no tag, FA, termset, or NA, so it's a plain sumti; no VUhO, ek,
joik, joik-ek, or gek, so it must derive down through that train to a
single sumti-5, quantifier cannot match, nor can relative-clauses, so
it's a plain sumti-6. So, a derivation from sumti-6 to "LE NU LE
BRIVLA BRIVLA".
the "(LA | LE) # sumti-tail /KU#/" portion of the alt is the only one
that can match the LE token, there are no matches for free, and no KU,
so that leaves us with "NU LE BRIVLA BRIVLA" to derive from
sumti-tail. We cannot match sumti-6 against the beginning of this and
there are no relative clauses or quantifiers, so it must be
sumti-tail-1 and thence to selbri; no tag, NA, or CO means that it is
a single selbri-3, it cannot be a selbri-4 followed by another
selbri-4 because selbri-4 will not match "NU LE BRIVLA"; so it must be
a single selbri-4. No joik-jek, joik, jek, BO, or NAhE sees us clear
to tanru-unit; no CEI or linkargs gets to tanru-unit-2. There, the
only rule that can match is "NU [NAI] # [joik-jek NU [NAI] #] ...
subsentence /KEI#/"; there is no NAI, free, joik-jek, or KEI, so that
takes up the NU and requires a derivation of "LE BRIVLA BRIVLA" from
subsentence. There is no prenex, so it is just a sentence. Because
bridi-tail must have either a selbri or a gek-sentence, and we don't
have a gek-sentence, and because the LE rule in sumti-6 will
eventually require a selbri as well, this must be split to {terms
bridi-tail} in the fashion of "(LE BRIVLA) (BRIVLA)" (I daresay this
is where an LALR(1)-based parser fails on this input; there are
algorithms that are proved handle any CFG, and won't be tripped up by
this issue). From there, it seems straightforward enough that "LE
BRIVLA" is derivable from terms in exactly one way, and "BRIVLA" is
derivable from sumti-tail in exactly one way; I'd rather not go
through it because I'm a bit tired of it and if you've been checking
this, I'm sure you are as well.
---end stream describing derivation---
That certainly seems to describe a valid derivation of "ko catlu le nu
le smacu bajra" from the grammar given in bnf.300, and also shows that
it is the only possible derivation of that sequence from bnf.300,
which is to say, that no grammatical ambiguity exists. As such, the
elidable terminators should not be required, and "ko catlu le nu le
smacu bajra" should have the same meaning as "ko catlu le nu le smacu
ku bajra". Of course, I fully acknowledge that this is not what the
official parser does, but as we're both advocating a more formal
replacement that is not, as it were, "bug-for-bug compatible", that
doesn't seem to be quite relevant; and, while being close enough to
what I was wanting to require a check, that particular pair of strings
does not quite provide the counterexample I was wanting. I really do
want to know if one actually exists, as it changes what I'm wanting to
do from building a grammar for lojban that avoids hacks by using a
more general parser than LALR to defining a new language that is
closely related to lojban but is context-free and still non-LALR.
Although with the yacc parser as the official grammar, I suppose
that's what I'd be doing anyways. But it feels different.
-Jonathan