[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] LALR1 question
On Mon, 27 Aug 2001, Invent Yourself wrote:
> > > I have always been led to believe that LALR(1) parsers are the maximum
> > > that can be conclusively proved to be unambiguous, which is why TLI
> > > Loglan
> > > grammar was based on YACC (and incidentally, Robin, has no shift-reduce
> > > conflicts)
> >
> > LR(k) and LL(k) (for all k) grammars are also unambiguous.
>
> How hard would it be to create an LALR(5) Lojban, and how different would
> it be to speak?
It would be easier to define the machine grammar for something which was
LALR(5), no nasty little restrictions, or need for kludges to force the
language to shift instead of reduce in the appropriate places.
Writing a LALR(5) parser generator and then running it (and using the
parser) would be far more difficult taskes.
LALR(5) would require that you create a matrix with dimensions n by n,
where n is the number of valid different 6 token strings in the language.
We've got 120 selma'o, (each selma'o =~ token), so in the worst case, its
a table of 2985984000000 by 2985984000000. (Each entry is at least a few
bytes, btw.)
A more realisitic case would still exhaust the memory of most computers
today.
The answer for your second question is significantly more difficult.
In fact, I don't suggest you read it. I'm answering mostly so as to ponder
the question outloud for myself. I suggest asking a psycholinguist, or at
least a linguist.
My guess is, that if the language actually made significant use of having
5 tokens of lookahead, that speaking it and understanding it would be
beyond many humans. Supposedly humans have got a short term memory of 7
give or take 2 items. LALR(5) would require that humans remember the last
5 words said, in addition to the current one. Sort of...
Memory and language processing is likely very different from the way a
parser works. But I'd imagine that if people had to hold 6 words or so in
memory, just to be able to identify a word as a determinor, or a group of
words as a noun phrase, they'd have real problems.
If you were merely upping the amount of look ahead so that you could leave
out terminators in a lot more cases, then it would probably make things
easier to remember. (Humans can stick missing terminators back in rather
handily, in many cases.)
(The last 3 paragraphs were pulled almost entirely out of my ass. Should
they happen to be correct, then I'll be impressed. See, however, the note
about asking a linguist.)
My conclusion: If you want the language to be syntactically unambiguous,
LALR(1) is a fairly good choice. The most you'd want to do is switch to an
LR(2) parser. If you need more than that, you're doing something wrong.
- Jay Kominek <jay.kominek@colorado.edu>
Plus ça change, plus c'est la même chose