From a.rosta@dtn.ntl.com Tue Aug 28 03:07:54 2001
Return-Path: <a.rosta@dtn.ntl.com>
X-Sender: a.rosta@dtn.ntl.com
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-7_3_2); 28 Aug 2001 10:07:53 -0000
Received: (qmail 80294 invoked from network); 28 Aug 2001 10:07:53 -0000
Received: from unknown (10.1.10.142)
  by l10.egroups.com with QMQP; 28 Aug 2001 10:07:53 -0000
Received: from unknown (HELO mta05-svc.ntlworld.com) (62.253.162.45)
  by mta3 with SMTP; 28 Aug 2001 10:07:53 -0000
Received: from andrew ([62.255.41.118]) by mta05-svc.ntlworld.com
  (InterMail vM.4.01.03.00 201-229-121) with SMTP
  id <20010828100750.YYKX20588.mta05-svc.ntlworld.com@andrew>
  for <lojban@yahoogroups.com>; Tue, 28 Aug 2001 11:07:50 +0100
Reply-To: <a.rosta@ntlworld.com>
To: <lojban@yahoogroups.com>
Subject: RE: [lojban] LALR1 question
Date: Tue, 28 Aug 2001 11:07:04 +0100
Message-ID: <LPBBJKMNINKHACNDIIGMGECDEKAA.a.rosta@dtn.ntl.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
In-Reply-To: <Pine.GSO.4.33.0108271715470.17048-100000@ucsub.colorado.edu>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
From: "And Rosta" <a.rosta@dtn.ntl.com>

Jay:
> On Mon, 27 Aug 2001, Invent Yourself wrote:
[...]
> > How hard would it be to create an LALR(5) Lojban, and how different wou=
ld
> > it be to speak?
[...]
> The answer for your second question is significantly more difficult.
>=20
> In fact, I don't suggest you read it. I'm answering mostly so as to ponde=
r
> the question outloud for myself. I suggest asking a psycholinguist, or at
> least a linguist.

Better a psycholinguist, because I'm a nonpsycho linguist & am not a
fount of knowledge. But let me muster what little I know. (This meagre
knowledge is 15 years old.)

Human parsing is done of course in real time and with minimal lookahead.
The parser can backtrack, but normally doesn't (-- the gardenpath effect
is a forced backtrack). Processing of all levels is done in parallel,
with phonology slightly ahead, so phonology activates words in the lexicon
whose sound matches the phonological string, but the lexical word is
identified on the basis of the syntax and meaning of the sentence up
to that point. Syntactic and semantic structure are built up simultaneously
and pragmatic interpretation also begins at the start of the sentence
and proceeds in step with the parse. Local grammatical ambiguities are=20
resolved by taking into account pragmatics (guessing what it was the speake=
r=20
intended to say, and also guessing what the speaker is about to say next),=
=20
and by default preferences for building syntactically simpler rather than=20
more complex structures (complexity here being not an abstract notion of
complexity but instead something psychologically concrete like the=20
demands placed on short-term memory). So strings that are grammatically=20
ambiguous in principle (as almost all are, in fact, in English at least)=20
are only very rarely parsed in practise as globally ambiguous.

The essence of this system, then, in comparison to possible ways that
computers might do it, is that human parsing is done incrementally left
to right with minimal lookahead and minimal backtracking with all local
ambiguities (as well as word-identification) resolved 'on the spot' on
the basis of all possible evidence available, both grammatical and
pragmatic.

> My guess is, that if the language actually made significant use of having
> 5 tokens of lookahead, that speaking it and understanding it would be
> beyond many humans. Supposedly humans have got a short term memory of 7
> give or take 2 items. LALR(5) would require that humans remember the last
> 5 words said, in addition to the current one. Sort of...
>=20
> Memory and language processing is likely very different from the way a
> parser works. But I'd imagine that if people had to hold 6 words or so in
> memory, just to be able to identify a word as a determinor, or a group of
> words as a noun phrase, they'd have real problems.

For sure.
=20
> If you were merely upping the amount of look ahead so that you could leav=
e
> out terminators in a lot more cases, then it would probably make things
> easier to remember. (Humans can stick missing terminators back in rather
> handily, in many cases.)

By "stick missing terminators back in" I presume you mean "backtrack so as
to attach a phrase to a node higher than the one it was attached to before"=
,
or "postpone resolving an attachment ambiguity until more incoming words
have been processed and there is sufficient information to resolve the=20
ambiguity"? My recollection is that the norm is the postponement strategy,
with the backtracking a last resort emergency strategy for moments of
crisis.

> (The last 3 paragraphs were pulled almost entirely out of my ass. Should
> they happen to be correct, then I'll be impressed. See, however, the note
> about asking a linguist.)

My remarks too are rather exculate, linguist's though the ass be.

--And.