From cbmvax!uunet!Think.COM!gls Thu Apr 11 19:00:54 1991 Return-Path: Date: Thu Apr 11 19:00:54 1991 Return-Path: From: Guy Steele Message-Id: <9104111938.AA29439@ukko.think.com> To: nsn@mullian.ee.mu.OZ.AU Cc: lojban-list@snark.thyrsus.com, nsn@mullian.ee.mu.oz.au In-Reply-To: nsn@mullian.ee.mu.OZ.AU's message of Thu, 11 Apr 91 18:25:02 +1000 <9104110825.12552@mullian.ee.mu.OZ.AU> Subject: Elision, or: Nick rides again in jbonai Status: RO Yes, you were being so snappy and irritable that you completely missed the point of my relatively pointless postscript: you overlooked the seventh sentence. Go back and look at it again. ^ The elidable terminators make the language unambiguous, but may often be ^ ^^ ???????? ^Did you mean "ambiguous"? No he does not. Whatever do you mean, Guy? Ah, now I understood what was meant: the modifier "elidable" was meant purely for identificational purposes rather than for purposes of characterization. I mistook the thought to be: If the language didn't have elidable terminators, it would be ambiguous; the presence of elidable terminators (as opposed to any other kind) somehow makes the language unambiguous. This seemed to me to state a falsehood, and I wanted to correct it to: If the language didn't have elidable terminators, it would be unambiguous; the presence of elidable terminators (as opposed to any other kind) somehow makes the language ambiguous. which has the ring of truth. But the thought apparently intended was actually: The terminators in question, namely those that happen to be elidable (that's a useful way for me to identify them to you), are required in the language to avoid ambiguity, though in practice they may be omitted in many places. So the construction in that English sentence was itself ambiguous. (Do you remember that Saturday Night Live sketch with Ed Asner as supervisor of a nuclear power plant, about to go on vacation? His parting words were, "Remember: you can't give the reactor too much coolant!" Those left behind spent the rest of the sketch discussing whether me meant that they mustn't exceed a given threshold or that there was no such threshold.) ^Hm. It seems to me that if the "official" grammar allows such elision ^in practice, then it behooves the language definers to produce a more ^elaborate grammar that takes this into account, if it can be done using ^a context-free grammar. But if the resulting grammar is context-sensitive, ^then allowing such elision may be a bad idea in the first place. Think again. In JL13, lojbab's YACC has no problem in filling in the missing terminals. Success with YACC is a lousy existence proof for having done the job right. Consider C, the original application for YACC: it has the classic dangling-ELSE problem: if (x > 0) if (y > z) y = 3; else x = 4; The dangling-ELSE ambiguity is resolved in practice by using an ambiguous grammar, as given in Kernighan and Ritchie's book (The C Programming Language, both editions), and then using a piece of code in the semantic productions that provides a context-sensitive patch: ELSE goes with the innermost eligible IF. So it causes the misindented example given above to be interpreted as if (x > 0) { if (y > z) y = 3; else x = 4; } and not as if (x > 0) { if (y > z) y = 3; } else x = 4; (I must stress that this patch is *not* part of the grammar proper. It's a piece of C code.) >From this point of view, the ability to elide the disambiguating braces in C is exactly analogous to being able to elide terminators in lojban. But that is not the only way to deal with the problem. It is not difficult to eliminate the dangling-ELSE ambiguity explicitly in the grammar. Instead of writing: statement: expression ; break ; continue ; { declaration-list/opt statement-list/opt } if ( expression) statement if ( expression ) statement else statement while ( expression ) statement do statement while ( expression ) ; ... you write something like this: statement: dangling-statement non-dangling-statement primitive-statement: expression ; break ; continue ; { declaration-list/opt statement-list/opt } do statement while ( expression ) ; non-dangling-statement: primitive-statement if ( expression ) non-dangling-statement else non-dangling-statement while ( expression ) non-dangling-statement ... dangling-statement: if ( expression ) statement if ( expression ) non-dangling-statement else dangling-statement while ( expression ) dangling-statement ... The idea is that "primitive statement", which do not have trailing embedded statements, are the base case of non-dangling-ness, whereas an IF statement without an ELSE is the source of dangling-danger. Statements that have trailing embedded statements, such as WHILE, have the same danglingosity as the embedded statement. (This means that such statements must appear in the grammar in both dangling and non-dangling forms.) Only a non-dangling statement may appear before an ELSE: this is what eliminates the ambiguity. Thus, with this grammar it is impossible for my example to be misinterpreted, and no special-case code is needed to ensure that there is only one parse. I am proposing that it is better for some purposes, if feasible, to produce a more complex grammar for lojban that would eliminate the need for context-sensitive side-conditions that dictate when elision is permissible. The C grammar has another bad property that cannot be eliminated with a more complex grammar: TYPEDEF symbols. It is impossible to determine the meaning of the statement printf(x); without knowing whether or not it has been preceded by typedef int printf; if it has, then printf(x); is a declaration of x as an integer, not a statement at all. This is also taken care of by parse-time semantic code. (Imagine a variant of lojban in which you could declare "little words" on the fly!) typedef at John; I gave John the office. /* This means "I gave at the office." */ As for what the grammar allows or disallows - my God, have you actually ever written an lojban sentence?! There is no need to dot every i and cross every chicken across the road. I agree. But I am not saying that the sentences generated by the grammar should be explicit; I am saying the grammar itself should be stated in an explicit form, rather than having parts of it written in C. A language in which I'd have to put in {ku} after every single damn sumti is a language I would not stick around in. In {le klama le seklama cu klama}, it is fairly obvious to me (and I don't really think it needs codifying) Maybe it's obvious to you, but it may not be obvious to a lot of your friends (some of whom might be computers :-). So here is where we disagree: I feel *very strongly* that it does deserve codifying in BNF (as opposed to code-ifying in C!). that {le klama} and {le se klama} are two distinct sumtis, and that starting a new sumti with {le} means the old sumti must have finished (elided {ku}). I don't know enough about parsing to tell whether the BNF handles such elisions as well as did the YACC: JC's presentation of what should be obvious was admittedly handwaving a bit. But people have no problem with elision, and machines have no major problem with it either, so where's the problem? However handwaving rule 10 was (and all you've got to do is read five lines of lojban to realise that it doesn't really matter), JC's BNF is cool. It's a pleasure to be actually able to check through a structure's validity in half a minute. I don't want to belittle the value of partial grammars. But using an incomplete grammar (such as the one in the back of K&R) doesn't tell you that a structure is valid. It may tell you that it is *invalid*, and if it does so you will have found out quickly, which is a useful thing. But conforming to an incomplete grammar does not guarantee correctness. The analogy: my "bad" C example will be considered valid by the K&R grammar, but misparsed. Similarly, if I incorrectly omit a lojban terminator, the existing grammar may consider the result valid, but it will be misparsed. For purposes of formal rigor and completeness, an unambiguous context-free grammar is desirable, even though it may be larger than what is useful for tutorial purposes. --Guy Steele