From a.rosta@dtn.ntl.com Tue Aug 28 03:07:54 2001 Return-Path: X-Sender: a.rosta@dtn.ntl.com X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_3_2); 28 Aug 2001 10:07:53 -0000 Received: (qmail 80294 invoked from network); 28 Aug 2001 10:07:53 -0000 Received: from unknown (10.1.10.142) by l10.egroups.com with QMQP; 28 Aug 2001 10:07:53 -0000 Received: from unknown (HELO mta05-svc.ntlworld.com) (62.253.162.45) by mta3 with SMTP; 28 Aug 2001 10:07:53 -0000 Received: from andrew ([62.255.41.118]) by mta05-svc.ntlworld.com (InterMail vM.4.01.03.00 201-229-121) with SMTP id <20010828100750.YYKX20588.mta05-svc.ntlworld.com@andrew> for ; Tue, 28 Aug 2001 11:07:50 +0100 Reply-To: To: Subject: RE: [lojban] LALR1 question Date: Tue, 28 Aug 2001 11:07:04 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 From: "And Rosta" Jay: > On Mon, 27 Aug 2001, Invent Yourself wrote: [...] > > How hard would it be to create an LALR(5) Lojban, and how different wou= ld > > it be to speak? [...] > The answer for your second question is significantly more difficult. >=20 > In fact, I don't suggest you read it. I'm answering mostly so as to ponde= r > the question outloud for myself. I suggest asking a psycholinguist, or at > least a linguist. Better a psycholinguist, because I'm a nonpsycho linguist & am not a fount of knowledge. But let me muster what little I know. (This meagre knowledge is 15 years old.) Human parsing is done of course in real time and with minimal lookahead. The parser can backtrack, but normally doesn't (-- the gardenpath effect is a forced backtrack). Processing of all levels is done in parallel, with phonology slightly ahead, so phonology activates words in the lexicon whose sound matches the phonological string, but the lexical word is identified on the basis of the syntax and meaning of the sentence up to that point. Syntactic and semantic structure are built up simultaneously and pragmatic interpretation also begins at the start of the sentence and proceeds in step with the parse. Local grammatical ambiguities are=20 resolved by taking into account pragmatics (guessing what it was the speake= r=20 intended to say, and also guessing what the speaker is about to say next),= =20 and by default preferences for building syntactically simpler rather than=20 more complex structures (complexity here being not an abstract notion of complexity but instead something psychologically concrete like the=20 demands placed on short-term memory). So strings that are grammatically=20 ambiguous in principle (as almost all are, in fact, in English at least)=20 are only very rarely parsed in practise as globally ambiguous. The essence of this system, then, in comparison to possible ways that computers might do it, is that human parsing is done incrementally left to right with minimal lookahead and minimal backtracking with all local ambiguities (as well as word-identification) resolved 'on the spot' on the basis of all possible evidence available, both grammatical and pragmatic. > My guess is, that if the language actually made significant use of having > 5 tokens of lookahead, that speaking it and understanding it would be > beyond many humans. Supposedly humans have got a short term memory of 7 > give or take 2 items. LALR(5) would require that humans remember the last > 5 words said, in addition to the current one. Sort of... >=20 > Memory and language processing is likely very different from the way a > parser works. But I'd imagine that if people had to hold 6 words or so in > memory, just to be able to identify a word as a determinor, or a group of > words as a noun phrase, they'd have real problems. For sure. =20 > If you were merely upping the amount of look ahead so that you could leav= e > out terminators in a lot more cases, then it would probably make things > easier to remember. (Humans can stick missing terminators back in rather > handily, in many cases.) By "stick missing terminators back in" I presume you mean "backtrack so as to attach a phrase to a node higher than the one it was attached to before"= , or "postpone resolving an attachment ambiguity until more incoming words have been processed and there is sufficient information to resolve the=20 ambiguity"? My recollection is that the norm is the postponement strategy, with the backtracking a last resort emergency strategy for moments of crisis. > (The last 3 paragraphs were pulled almost entirely out of my ass. Should > they happen to be correct, then I'll be impressed. See, however, the note > about asking a linguist.) My remarks too are rather exculate, linguist's though the ass be. --And.