[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Elidable terminators



   Date: Wed, 17 Apr 91 16:37:28 -0700
   From: Jeff Prothero <jsp@milton.u.washington.edu>

   jp> = Jeff Prothero
   gs> = Guy Steele Jr.
   db> = Dave Bowen

   gs>Indeed.  Moreover, it may even be the case that the language actually
   gs>is LR(1), that is, there does exist an LR(1) grammar that specifies
   gs>it but such a grammar has not been found yet.  Part of my point is
   gs>that the attempt to find an unambiguous LR(1) grammar may yield new
   gs>insights about the grammatical structure of the language.
	<C example deleted>
   gs>om a contained statement to its container did not
   gs>become clear to me until it was forced upon me by the exercise of
   gs>constructing the grammar.  We might find lojban constructs also
   gs>falling into two general categories: those after which (before
   gs>which?) elision is permissible, and all others.  The nature of that
   gs>dichotomy is worth exploring.

   I don't understand this comment at all.  (If it were someone less
   illustrious than Guy, I'd say something stronger. :) 

You are too gracious, which is kind of you.  But never accept a proof
by intimidation!  (I am about to disagree with you, but don't accept
it without thinking about it!)
							Without dragging
   out my attempt to make this formal, if it is even still around, the
   elidability of a particular ET in a particular utterance depends
   entirely on the particular sentential construct following the ET --
   can it be mistaken for a continuation of the current construct, or
   not?  (To make it formal, you have to decide things like how much
   lookahead you are going to allow in making this decision...)

So far I agree.

   This is a purely contextual computation, and trying to perform it with
   context-free-grammars is an abuse of the formalism.

But this I will label "balderdash".  Carried to an extreme, this
notion states that *any* word can come at *any* point in a sentence,
because to take an adjacent word into consideration would make the
decision "context sensitive".  The term "context-free" has a very
particular technical meaning, namely that in generating a string from
a grammar a production may be applied to any nonterminal at any time
without regard to what is adjacent to that nonterminal.  But what we
would *loosely* call "contextual considerations" are enforced by the
structure of individual productions.  For if not, how could we
guarantee that a preposition precedes what it governs, or that
ELSE must follow an IF?
							Yes, you can do
   it -- if construct M in the grammar can be followed by constructs X, Y
   and Z, you can split M into three variants M-followed-byX,
   M-followed-by-Y, and M-followed-by-Z, and you may be able to show that
   in all of these cases, the "elidable" terminator is either clearly
   required or clearly forbidden.  But the exercise serves no particular
   purpose that I can see

It serves precisely the purpose of distinguishing *formally* the cases
in which elision is permitted.  Indeed, it may be useful to understand
that Y and Z are the cases that permit elision, whereas X does not.
It's one thing to eyeball a simplfied grammar and say to yourself,
"Well, eliding a terminator here doesn't seem to cause any problems"
and it is quite another to create a complete grammar and have a parser-
generating tool confirm that the resulting grammar is indeed unambiguous
(and perhaps LR(n)).
			 , and leads to an exponential increase in the
   size of the grammar.

Fixing the IF-ELSE problem in C increases its grammar by 5% or less.
If in fact the rules of lojban cause an exponential increase in
complexity, that may be worth knowing, too.  But I doubt that it does.

			Maybe Guy would like to compute the approximate
   number of rules the final grammar would have? ;-)

I'm not sure I know what is the latest thing, or have the latest copy.
But if you will e-mail to me the latest version, I'll take a look at it.

   DB> That's certainly one way of dealing with the issue of elibable
   DB> terminators.  Guy's comments raised the question, at least for me, of
   DB> whether it is the best way.  Is it possible to create a grammar for
   DB> Lojban which has the terminator as an optional production in those places
   DB> and only those places where it can safely be elided?  Or are we stuck
   DB> with the <error> kludge, even if it is safe?  If the answer to this first
   DB> question is, "We don't know.", then my question is what would be a
   DB> reasonable way to investigate this question, given the lack of tools
   DB> for hacking on grammars?  Sorry for all the quotes, but I wanted to
   DB> ensure people understood the basis for my questions.

   A reasonable question, Dave, but I think you have a false dichotomy
   here.  After a decade, things will be clearer... :)

   The first approach I tried was the error kludge.

   The second and third (PLoP) were variants on topdown backtracking parsing,
   which got beat to death by the exponential explosions. 

   The fourth (FLoP) was a bottom-up backtracker -- i.e., a regular
   LALR(1) engine, but with the ambiguities left in the parser tables,
   and the stackmachine hacked to backtrack at ambiguities.  This gets
   most of the speed of a regular LALR(1) parser (since most tokens are
   unambiguous, and the shift/reduce can be done with no backtracking
   needed) while allowing the non-LALR(1) (and non-ET) parts of the
   grammar to be handled without special preparsing hacks.  

   Elidable terminators are handled by establishing a budget for ETs --
   first, try to parse with no ETs inserted, then with only 1, then with
   only two, etc.  A fair amount of backtracking, but not intolerable,
   given that you only have to parse one connected sentence at a time.
   Last time I seriously thought this through, this seemed the only
   approach that is simple and clearly gives you the parse you want.

   Again, I urge you: If you want to tackle this, start by forgetting
   YACC, sit down and formally specify what elidable terminators are
   supposed to do -- what strings are you adding to the language and what
   are their parses?  This is not as obvious as it may seem!

Exactly!  What I am proposing is that producing a context-free grammar,
or at least attempting to do so, may be a particularly productive
formal method.
							      When you
   have this crystal clear, *then* start thinking about implementations
   which will produce the result you have specified, or an acceptable
   approximation.

--Guy