From cowan  Sat Mar  6 23:00:42 2010
Subject: Re: Grammar
To: hombre!uunet!aurs01!jack (Jack Waugh)
From: cowan
Date: Mon, 1 Oct 90 10:38:41 EDT
In-Reply-To: <9009301439.AA07299@aurs01.>; from "Jack Waugh" at Sep 30, 90 10:39 am
X-Mailer: ELM [version 2.3 PL2]
Status: RO
X-From-Space-Date: Mon Oct  1 10:38:41 1990
X-From-Space-Address: cowan
Message-ID: <zI3n96HkeTD.A.bfC.a80kLB@chain.digitalkingdom.org>

> I don't recall that we have met, although we might have at a
> Logfest.  My name is Jack Waugh.  I have been
> interested in constructed predicate-based language for human
> speech since reading L1 in 1976.  I happened to move to the DC
> area and so fell in with Bob Lechevalier, for good or ill.

No, I've never been to a Logfest.  I joined lei lojbo only at the beginning
of this year, and I couldn't make Logfest '90.  I will be at LogFair, though.

> I would be interested in seeing a copy of that two-page BNF
> Bob says you wrote.

Attached below; it's about 4 pages long, with plenty of white space.
Please do not redistribute this text, as nobody else has seen it or
checked it; return comments directly to me <cowan@marob.masa.com>

> It is a bit disappointing, although expected, that you are
> announcing the baselining of the grammar in its current state.
> I think it should have undergone a simplification pass.
> I think I oppose the "let a thousand flowers bloom"
> philosophy.  Every learner of the language has to know all
> the little words in order to parse arbitrary grammatical
> utterances.  Therefore, complexity in the grammar is very
> expensive in learning time.

I felt this way at first too, until I really began to dig at the YACC
version.  This is my third attempt at rendering it in BNF:  the first
two times, I attempted to make an essentially mechanical translation,
and the result was riddled with errors.  I found I had to go through
the YACC version line-by-line until I understood (at least to a degree)
the semantic import of everything in it.  Then I could re-express the
syntax which captures that semantics in BNF notation.

It is certainly true that to parse Lojban >completely<, one must know
the whole grammar.  However, for human beings complete parsing is not
always necessary.  We understand many sentences that do not parse, i.e.
*"me see she", and there are sentences that we can parse but cannot understand
on the fly, e.g. *"The man that the dog that the fly that the virus infected
bit bit bit me".  We can generate this sentence, but trying to understand it
in speech provokes stack overflow in the listener.

Computer listeners, of course, must be able to parse everything, but that is
clearly within the capabilities of existing systems: after all, a parser
exists.  The machine parser will be able to cope with the Lojban version of the
second sample sentence above, but humans will still not be able to understand
it even in Lojban (I think -- if that turns out not to be the case, it will
be a powerful Whorfian effect indeed!).

> Predicate language is not supposed to be a word-for-word
> translation of a natural lanugage.  It started out
> radically different.  I suspect some of the grammatical
> complications that have crept in will let speakers continue
> in the grammatical viewpoint of natural languages and thus
> have less tendency to adopt the predicate viewpoint and
> think about what they really mean to say.  Of course, I have
> no place from which to stand and complain, since I have not
> given time to the grammar myself.

I think that "word-for-word translation" is hardly likely.  There are a
few points which "accommodate" natural language, like internal "naku"
negation (which allows us to replicate the "illogic" of English
"Some people don't like going there"), but they are highly marked and
rather hard to use correctly.  The easy and natural way of saying things
in Lojban remains the predicate way, or perhaps the predicate-plus-tanru
way.

My favorite example for illustrating that Lojban is not just an "English-based
code", but has its own grammatical structure, is this translation of
Simonides' epigram at Thermopylae:

	Tell them in Lakedaimon, passerby
	That here, obedient to their laws, we lie.

	ko cusku fi me la lykedaimon. do klama do'u
	fe le nu mi nu tinbe le ri flalu kei morsi

Literally:
	(Imperative!) You express-to those pertaining-to Lakedaimon,
	O comer/goer, the event-of we being-an event-of (we obey the
	laws of the last-mentioned) kind-of dead.

This construction, with a tanru based on an abstraction (the inner "nu")
itself contained within an abstraction, is easy and natural Lojban.

> Let grammar-1 mean the mathematical function that maps from
> utterances to parses (or to rejections as ungrammatical),
> independently of how that function might be expressed or
> prescribed.  Then if A and B stand for grammar-1s and U is
> a putative utterance, if for all U A(U) = B(U) then A = B.
> 
> Let a grammar-2 (you might know better terms than grammar-1
> and grammar-2) be a pair (E, L) such that E is an expression
> in some mathematical notation and L (for "language") is
> the meaning of the notation used in E, such that L(E) is a
> grammar-1.
> 
> Let a grammer-3 be the expression E of a grammar-2 (E, L).
> 
> Then the official YACC grammar of Lojban is a grammar-3 (and L is
> YACC plus the preprocessor, in essence).
> 
> A goal often expressed for a grammar-3 of a version of Loglan is
> that it be "unambiguous".  Hidden behind this is some kind of
> constraint or criterion on L, since L could always be constructed
> so that E couldn't be ambiguous.

Okay, I believe I understand your definitions.  In this context, L is
described by the term LALR(1), meaning that it can be mechanically parsed
by a finite-state automaton which can inspect 1) the entire left context
so far and 2) the next available symbol.  The compounder serves to pack
up certain word combinations into single symbols for the parser's benefit,
and is itself LALR(1) or even weaker.

What point(s) are you going to make with these definitions?  Please study
the BNF below and fill me in further.

-----cut here-----cut here-----cut here-----cut here-----cut here-----cut here
LOJBAN MACHINE GRAMMAR, BNF VERSION, BASELINED AS OF 20 JULY 1990

COPYRIGHT 1989,1990 THE LOGICAL LANGUAGE GROUP, INC.
CONTACT THAT ORGANIZATION AT 2904 BEAU LANE, FAIRFAX VA 22031 USA 703-385-0273

PERMISSION TO COPY GRANTED SUBJECT TO YOUR VERIFICATION THAT THIS IS THE
LATEST VERSION OF THE LOJBAN GRAMMAR, THAT YOUR DISTRIBUTION BE FOR
PROMOTION OF LOJBAN, THAT THERE IS NO CHARGE FOR THE PRODUCT, AND THAT
THIS COPYRIGHT NOTICE IS INCLUDED INTACT IN THE COPY.

Explanation of notation:

All rules have the form:
	name(number) = bnf-expression
which means that the grammatical construct "name" is defined by
"bnf-expression".  The number cross-references this grammar with
the rule numbers in the YACC grammar.

1)  Names in lower case are grammatical constructs.
2)  Names in UPPER CASE are selma'o (lexeme) names, and are terminals.
3)  Concatenation is expressed by juxtaposition with no operator symbol.
4)  | represents alternation (choice).
5)  [] represents an optional element.
6)  & represents and/or ("A & B" is the same as "A | B | A B").
7)  ... represents optional repetition of the construct to the left.
    Left-grouping is implied; right-grouping is shown by explicit recursion.
8)  () serves to indicate the grouping of the other operators.
9)  # is shorthand for "[free ...]", a construct which appears in many places.
10) // encloses an elidable terminator, which may be omitted (without change
    of meaning) if no grammatical ambiguity results.


text(0) = ((CMENE ... #) | (indicators & free ...))
	[joik-jek] [I [jek | joik] # | NIhO ... # ] (paragraphs | /FAhO/)

paragraphs(4) = paragraph [NIhO ... # paragraphs] /FAhO/

paragraph(10) = paragraph-1 [I [jek | joik] # (paragraph-1 | /POhO#/)] ...

paragraph-1(11) = paragraph-2
	[I [jek | joik] [stag] BO # (paragraph-2 | /POhO#/)] ...

paragraph-2(12) = utterance | [prenex | tag] TUhE paragraphs /TUhU#/

utterance(20) = (ek | gihek | zihek) # | quantifier /POhO#/ | NA /POhO#/
	term ... /VAU#/ | prenex | relative-clauses | links | linkargs |
	sentence

prenex(30) = term ... ZOhU #

sentence(40) = bridi-tail | sentence-1

sentence-1(41) = gek sentence-1 gik sentence-1 | prenex sentence |
	term ... /CU#/ bridi-tail

bridi-tail(50) = simple-bridi-tail | mex-relation [term ...] /VAU#/ |
	gek-bridi-tail | tagged-gek-bridi-tail

gek-bridi-tail(51) = gek bridi-tail gik bridi-tail | NA gek-bridi-tail |
	NA tagged-gek-bridi-tail

tagged-gek-bridi-tail(52) = tag KE gek-bridi-tail /KEhE#/

simple-bridi-tail(53) = front-bridi 
	[gihek [stag] KE # simple-bridi-tail /KEhE#/]

front-bridi(60) = (selbri | front-bridi gihek # back-bridi) [term ...] /VAU#/

back-bridi(62) = selbri [terms ...] /VAU#/ [gihek [stag] BO # back-bridi]

term(81) = sumti | (tag | FA #) (sumti | /KU#/) | termset | NA KU #

termset(83) = NUhI [NAhE] gek term ... /NUhU#/ gik term ... /NUhU#/ |
	NUhI term ... /NUhU#/ ek # term ... /NUhU#/

sumti(90) = sumti-1 [(joik # | ek #) sumti-1] ...

sumti-1(91) = sumti-2 [ek [stag] BO # sumti-1]

sumti-2(92) = sumti-3 [ek [stag] KE # sumti /KEhE#/] ...

sumti-3(93) = [quantifier] sumti-4 |
	quantifier [quantifier] selbri /KU#/ [relative-clauses]

sumti-4(96) = (LAhE | NAhE BO #) sumti-3 | sumti-5 [relative-clauses] |
	gek sumti gik sumti-3

sumti-5(99) = KOhA # | letteral-string # /BOI/ # | LA CMENE ... # |
	(LA | LE) sumti-tail /KU#/ |
	LI mex LOhO | LUhI sumti /LUhU#/ |
	ZO any-word # | LU text /LIhU/ # | LOhU any-word ... LEhU # |
	ZOI any-word anything any-word #

relative-clauses(110) = relative-clause [zihek # relative-clause] ...

relative-clause(111) = GOI term /GEhU#/ | NOI sentence /KUhO#/

sumti-tail(113) = [sumti-4] [quantifier] selbri | quantifier sumti

selbri(130) = [tag] selbri-1

selbri-1(131) = (NA [tag]) ... selbri-2 ... [CO selbri-1]

selbri-2(133) = selbri-3 [joik-jek selbri-3] ...

selbri-3(134) = selbri-4 [BO selbri-3] |
	[NAhE] guhek selbri gik selbri-3

selbri-4(150) = selbri-5 [CEI selbri-5] ...

selbri-5(151) = selbri-6 [linkargs] | NAhE selbri-5
	NU [NAI] [joik-jek NU [NAI]] ... sentence /KEI#/

selbri-6(154) = BRIVLA # | GOhA [RAhO] # |
	(number | letteral-string) MOI # |
	KE selbri-2 ... /KEhE#/ | ME sumti /MEhU#/ |
	NUhA mex-operator | SE # selbri-6

linkargs(160) = BE term [links] /BEhO#/

links(161) = BEI term [links]

quantifier(300) = number # /BOI/ | VEI [GAhO] mex /VEhO [GAhO] #/

mex-relation(301) = DU | DOhE mex-operator | NA mex-relation |
	SE # mex-relation

mex(310) = mex-1 [operator mex-1] ... | FUhA rp-expression

mex-1(311) = mex-2 [BO operator mex-1]

mex-2(312) = operand | PEhO operator mex-3 /KUhE#/

mex-3(313) = mex-2 ... | operator mex-3 /KUhE#/

rp-expression(330) = rp-operand rp-operand operator

rp-operand(332) = operand | rp-expression

operator(370) = operator-1 [joik-jek operator-1] ...

operator-1(371) = operator-2 | guhek operator-1 gik operator-2

operator-2(372) = mex-operator # | KE operator /KEhE#/

mex-operator(374) = [SE # | NAhE] ... (VUhU | REhO mex-relation
	MAhO letteral-string # /BOI/ | NAhU bridi-tail /TEhU/)

operand(381) = operand-1 [ek # operand-1] ...

operand-1(382) = operand-2 [ek [stag] BO # operand-1]

operand-2(383) = [LAhE ...] operand-3 [ek [stag] KE # operand /KEhE#/] ...

operand-3(385) = quantifier | letteral-string # /BOI/ |
	NIhE selbri-6 [linkargs] /TEhU/ | MOhE sumti /TEhU/ |
	JOhI mex-3 /KUhE#/ | gek operand gik [LAhE ...] operand-3

number(961) = PA [PA | letteral] ...

letteral-string(986) = letteral [PA | letteral] ...

letteral(987) = BY | A BU | I BU | Y BU | ZAI letteral-string FOI |
	LAU letteral | TEI letteral letteral

ek(802) = [NA] [SE] A [NAI]

gihek(818) = [NA] [SE] GIhA [NAI]

zihek(820) = [NA] [SE] ZIhA [NAI]

jek(805) = [NA] [SE] JA [NAI]

joik(806) = [SE] JOI [NAI] | BIhI [GAhO GAhO]

joik-jek(422) = joik # | jek #

gek(807) = [SE] GA [NAI] # | stag GI [NAI] #

guhek(808) = [SE] GUhA [NAI] #

gik(816) = GI [NAI] #

tag(491) = tense-aspect [joik-jek tense-aspect] ... | CUhE #

stag(971) = simple-tense-aspect [(jek | joik) simple-tense-aspect] ... | CUhE

tense-aspect(815) = simple-tense-aspect # | FIhO selbri /FEhU/ #

simple-tense-aspect(972) = [SE] BAI [NAI] | (tense & CAhA) |
	NAhE simple-tense-aspect

tense(975) = [origin [KI]] time & space [KI] | KI

origin(977) = ZEhA & VEhA & VIhA

time(1030) = ZI [time-interval] | (time-offset ...) & time-interval

time-offset(1033) = PU [NAI] [ZI]

time-interval(1034) = (PU [NAI] ZEhA) & interval-modifier

space(1040) = space-1 & (MOhI space-offset)

space-1(1042) = VA [space-interval] | (space-offset ...) & space-interval

space-offset(1045) = FAhA [NAI] [VA]

space-interval(1046) = (FAhA [NAI] (VEhA & VIhA)) & FEhE interval-modifier |
	(FAhA [NAI] (VEhA & VIhA)) interval-modifier

interval-modifier(1050) = interval-property [(ZAhO [interval-property]) ...]

interval-property(1051) = number ROI [NAI] | TAhE [NAI]

free(32) = SEI # [term ... /CU#/] selbri /SEhU/ |
	SOI sumti [sumti] /SEhU/ |
	TIhO mex-operator quantifier /SEhU/ |
	TIhO mex-operator mex-relation mex-operator /SEhU/ |
	vocative selbri [relative-clauses] /DOhU/ |
	vocative CMENE ... # [relative-clauses] /DOhU/ |
	vocative [sumti] /DOhU/ | TO text /TOI/ |
	XI (number | letteral-string) # /BOI/
	
vocative(415) = (COI [NAI]) ... & DOI	

indicators(801) = [FUhE] indicator ...

indicator(907) = (number | letteral-string) MAI | (UI | Y | CAI) [NAI]


The following rules are non-formal:

any-word(1100) = [BAhE] any-word [indicators]

anything = "any text at all, whether Lojban or not"

null(1101) = any-word SI | utterance SA | text SU | POhA | PEhA | DAhO | FUhO

-- 
cowan@marob.masa.com			(aka ...!hombre!marob!cowan)
			e'osai ko sarji la lojban