[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Re: Is Lojban a CFG? (was Re: [lojban-beginners] Re: Enumerating in Lojban)



On 7/12/06, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:

> Probably because the alternative can be very unintuitive.

[snip]

Jonathan, I'd like to see an example of a CFG that would handle
the above by any rule at all.  Just out of curiousity.

It'd look exactly like the grammar to handle it now would if you
replaced all of the "//" pairs with "[]", in the notation of bnf.300,
and there'd be a seperate mechanism, that by a formalism such as
associating an ordering and grouping with each rule, defines a
particular semantic meaning to associate with the string. I'm not up
to defining the Lojban version for those statements for this email
(numbers being one of the areas I have not learned very solidly; I'd
need to be reading out of the bnf for most of it), but it would be the
same sort of idea as this, for dealing with formulas consisting of
products and sums, with parens:
(rule number) nonterminal -> BNF-ish expression

(1) expr -> expr '+' expr
(2) expr -> expr '*' expr
(3) expr -> '(' expr ')'
(4) expr -> NUMBER

Now, that's very ambiguous, so we disambiguate by defining that rules
(1) (2) and (3) are left-grouping, and imposing a precedence order of
(1) (2) (3) (4), such that (1) binds first, i.e. preferentially
matches the most, and (4) binds last, which is to say, the least.

To derive the string "5+3*5" from the CFG, we have either the
sequence: expr -> expr '+' expr -> NUMBER '+' expr '*' expr -> NUMBER
'+' NUMBER '*' NUMBER, corresponding to the meaning "5 + (3*5)", or
expr -> expr '*' expr -> expr '+' expr '*' NUMBER -> NUMBER '+' NUMBER
'*' NUMBER, corresponding to the meaning "(5 + 3) * 5". The prioritiy
ordering says to bind (1) before binding (2), so the first derivation
is the one associated with the semantic meaning of this string.

I think, though, that most people would rather the parser reject a sentence like:
[snip - example sentences that would be affected]
This (the current behaviour) seems to me to reduce the chances for
confusion *substantially*.  But then I haven't thought about it much
yet.

Whereas I would have that be a warning (additional information beyond
accept or reject, sort-of a "hey, that looks like a human making a
mistake" message) in an acutal parser, to be handled in an
implementation-dependent manner by whatever sort of program it is
situated in, rather than a formal reject
("you-can-stop-parsing-now-it's-not-lojban").

Basically, I'd rather such error-checking behavior be a nice feature
of a particular parser, possibly extremely widespread, rather than
part of the definition of the set of strings comprising Lojban and the
association between the strings in Lojban and particular meanings.
It'd be an easy feature to add, too; how you'd add it is dependent on
what algorithm you are using, for instance, if you are using the CYK
algorithm and then appealing to precedence and grouping rules to
determine which derivation is associated with meaning, you could just
send that warning off if you have more than one derivation. I have an
inkling how to track it in bison's setup, as well.

-Jonathan