From jay.kominek@colorado.edu Mon Aug 27 16:30:17 2001 Return-Path: X-Sender: kominek@ucsub.colorado.edu X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_3_2); 27 Aug 2001 23:30:17 -0000 Received: (qmail 85914 invoked from network); 27 Aug 2001 23:29:58 -0000 Received: from unknown (10.1.10.27) by l7.egroups.com with QMQP; 27 Aug 2001 23:29:58 -0000 Received: from unknown (HELO ucsub.colorado.edu) (128.138.129.12) by mta2 with SMTP; 27 Aug 2001 23:29:58 -0000 Received: from ucsub.colorado.edu (kominek@ucsub.colorado.edu [128.138.129.12]) by ucsub.colorado.edu (8.11.6/8.11.2/ITS-5.0/student) with ESMTP id f7RNTB328223 for ; Mon, 27 Aug 2001 17:29:11 -0600 (MDT) Date: Mon, 27 Aug 2001 17:29:11 -0600 (MDT) To: Subject: Re: [lojban] LALR1 question In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE From: Jay Kominek X-Yahoo-Message-Num: 10193 On Mon, 27 Aug 2001, Invent Yourself wrote: > > > I have always been led to believe that LALR(1) parsers are the maxim= um > > > that can be conclusively proved to be unambiguous, which is why TLI > > > Loglan > > > grammar was based on YACC (and incidentally, Robin, has no shift-redu= ce > > > conflicts) > > > > LR(k) and LL(k) (for all k) grammars are also unambiguous. > > How hard would it be to create an LALR(5) Lojban, and how different would > it be to speak? It would be easier to define the machine grammar for something which was LALR(5), no nasty little restrictions, or need for kludges to force the language to shift instead of reduce in the appropriate places. Writing a LALR(5) parser generator and then running it (and using the parser) would be far more difficult taskes. LALR(5) would require that you create a matrix with dimensions n by n, where n is the number of valid different 6 token strings in the language. We've got 120 selma'o, (each selma'o =3D~ token), so in the worst case, its a table of 2985984000000 by 2985984000000. (Each entry is at least a few bytes, btw.) A more realisitic case would still exhaust the memory of most computers today. The answer for your second question is significantly more difficult. In fact, I don't suggest you read it. I'm answering mostly so as to ponder the question outloud for myself. I suggest asking a psycholinguist, or at least a linguist. My guess is, that if the language actually made significant use of having 5 tokens of lookahead, that speaking it and understanding it would be beyond many humans. Supposedly humans have got a short term memory of 7 give or take 2 items. LALR(5) would require that humans remember the last 5 words said, in addition to the current one. Sort of... Memory and language processing is likely very different from the way a parser works. But I'd imagine that if people had to hold 6 words or so in memory, just to be able to identify a word as a determinor, or a group of words as a noun phrase, they'd have real problems. If you were merely upping the amount of look ahead so that you could leave out terminators in a lot more cases, then it would probably make things easier to remember. (Humans can stick missing terminators back in rather handily, in many cases.) (The last 3 paragraphs were pulled almost entirely out of my ass. Should they happen to be correct, then I'll be impressed. See, however, the note about asking a linguist.) My conclusion: If you want the language to be syntactically unambiguous, LALR(1) is a fairly good choice. The most you'd want to do is switch to an LR(2) parser. If you need more than that, you're doing something wrong. - Jay Kominek Plus =C3=A7a change, plus c'est la m=C3=AAme chose