From cowan Sat Mar 6 23:00:42 2010 Subject: Re: Grammar To: hombre!uunet!aurs01!jack (Jack Waugh) From: cowan Date: Mon, 1 Oct 90 10:38:41 EDT In-Reply-To: <9009301439.AA07299@aurs01.>; from "Jack Waugh" at Sep 30, 90 10:39 am X-Mailer: ELM [version 2.3 PL2] Status: RO X-From-Space-Date: Mon Oct 1 10:38:41 1990 X-From-Space-Address: cowan Message-ID: > I don't recall that we have met, although we might have at a > Logfest. My name is Jack Waugh. I have been > interested in constructed predicate-based language for human > speech since reading L1 in 1976. I happened to move to the DC > area and so fell in with Bob Lechevalier, for good or ill. No, I've never been to a Logfest. I joined lei lojbo only at the beginning of this year, and I couldn't make Logfest '90. I will be at LogFair, though. > I would be interested in seeing a copy of that two-page BNF > Bob says you wrote. Attached below; it's about 4 pages long, with plenty of white space. Please do not redistribute this text, as nobody else has seen it or checked it; return comments directly to me > It is a bit disappointing, although expected, that you are > announcing the baselining of the grammar in its current state. > I think it should have undergone a simplification pass. > I think I oppose the "let a thousand flowers bloom" > philosophy. Every learner of the language has to know all > the little words in order to parse arbitrary grammatical > utterances. Therefore, complexity in the grammar is very > expensive in learning time. I felt this way at first too, until I really began to dig at the YACC version. This is my third attempt at rendering it in BNF: the first two times, I attempted to make an essentially mechanical translation, and the result was riddled with errors. I found I had to go through the YACC version line-by-line until I understood (at least to a degree) the semantic import of everything in it. Then I could re-express the syntax which captures that semantics in BNF notation. It is certainly true that to parse Lojban >completely<, one must know the whole grammar. However, for human beings complete parsing is not always necessary. We understand many sentences that do not parse, i.e. *"me see she", and there are sentences that we can parse but cannot understand on the fly, e.g. *"The man that the dog that the fly that the virus infected bit bit bit me". We can generate this sentence, but trying to understand it in speech provokes stack overflow in the listener. Computer listeners, of course, must be able to parse everything, but that is clearly within the capabilities of existing systems: after all, a parser exists. The machine parser will be able to cope with the Lojban version of the second sample sentence above, but humans will still not be able to understand it even in Lojban (I think -- if that turns out not to be the case, it will be a powerful Whorfian effect indeed!). > Predicate language is not supposed to be a word-for-word > translation of a natural lanugage. It started out > radically different. I suspect some of the grammatical > complications that have crept in will let speakers continue > in the grammatical viewpoint of natural languages and thus > have less tendency to adopt the predicate viewpoint and > think about what they really mean to say. Of course, I have > no place from which to stand and complain, since I have not > given time to the grammar myself. I think that "word-for-word translation" is hardly likely. There are a few points which "accommodate" natural language, like internal "naku" negation (which allows us to replicate the "illogic" of English "Some people don't like going there"), but they are highly marked and rather hard to use correctly. The easy and natural way of saying things in Lojban remains the predicate way, or perhaps the predicate-plus-tanru way. My favorite example for illustrating that Lojban is not just an "English-based code", but has its own grammatical structure, is this translation of Simonides' epigram at Thermopylae: Tell them in Lakedaimon, passerby That here, obedient to their laws, we lie. ko cusku fi me la lykedaimon. do klama do'u fe le nu mi nu tinbe le ri flalu kei morsi Literally: (Imperative!) You express-to those pertaining-to Lakedaimon, O comer/goer, the event-of we being-an event-of (we obey the laws of the last-mentioned) kind-of dead. This construction, with a tanru based on an abstraction (the inner "nu") itself contained within an abstraction, is easy and natural Lojban. > Let grammar-1 mean the mathematical function that maps from > utterances to parses (or to rejections as ungrammatical), > independently of how that function might be expressed or > prescribed. Then if A and B stand for grammar-1s and U is > a putative utterance, if for all U A(U) = B(U) then A = B. > > Let a grammar-2 (you might know better terms than grammar-1 > and grammar-2) be a pair (E, L) such that E is an expression > in some mathematical notation and L (for "language") is > the meaning of the notation used in E, such that L(E) is a > grammar-1. > > Let a grammer-3 be the expression E of a grammar-2 (E, L). > > Then the official YACC grammar of Lojban is a grammar-3 (and L is > YACC plus the preprocessor, in essence). > > A goal often expressed for a grammar-3 of a version of Loglan is > that it be "unambiguous". Hidden behind this is some kind of > constraint or criterion on L, since L could always be constructed > so that E couldn't be ambiguous. Okay, I believe I understand your definitions. In this context, L is described by the term LALR(1), meaning that it can be mechanically parsed by a finite-state automaton which can inspect 1) the entire left context so far and 2) the next available symbol. The compounder serves to pack up certain word combinations into single symbols for the parser's benefit, and is itself LALR(1) or even weaker. What point(s) are you going to make with these definitions? Please study the BNF below and fill me in further. -----cut here-----cut here-----cut here-----cut here-----cut here-----cut here LOJBAN MACHINE GRAMMAR, BNF VERSION, BASELINED AS OF 20 JULY 1990 COPYRIGHT 1989,1990 THE LOGICAL LANGUAGE GROUP, INC. CONTACT THAT ORGANIZATION AT 2904 BEAU LANE, FAIRFAX VA 22031 USA 703-385-0273 PERMISSION TO COPY GRANTED SUBJECT TO YOUR VERIFICATION THAT THIS IS THE LATEST VERSION OF THE LOJBAN GRAMMAR, THAT YOUR DISTRIBUTION BE FOR PROMOTION OF LOJBAN, THAT THERE IS NO CHARGE FOR THE PRODUCT, AND THAT THIS COPYRIGHT NOTICE IS INCLUDED INTACT IN THE COPY. Explanation of notation: All rules have the form: name(number) = bnf-expression which means that the grammatical construct "name" is defined by "bnf-expression". The number cross-references this grammar with the rule numbers in the YACC grammar. 1) Names in lower case are grammatical constructs. 2) Names in UPPER CASE are selma'o (lexeme) names, and are terminals. 3) Concatenation is expressed by juxtaposition with no operator symbol. 4) | represents alternation (choice). 5) [] represents an optional element. 6) & represents and/or ("A & B" is the same as "A | B | A B"). 7) ... represents optional repetition of the construct to the left. Left-grouping is implied; right-grouping is shown by explicit recursion. 8) () serves to indicate the grouping of the other operators. 9) # is shorthand for "[free ...]", a construct which appears in many places. 10) // encloses an elidable terminator, which may be omitted (without change of meaning) if no grammatical ambiguity results. text(0) = ((CMENE ... #) | (indicators & free ...)) [joik-jek] [I [jek | joik] # | NIhO ... # ] (paragraphs | /FAhO/) paragraphs(4) = paragraph [NIhO ... # paragraphs] /FAhO/ paragraph(10) = paragraph-1 [I [jek | joik] # (paragraph-1 | /POhO#/)] ... paragraph-1(11) = paragraph-2 [I [jek | joik] [stag] BO # (paragraph-2 | /POhO#/)] ... paragraph-2(12) = utterance | [prenex | tag] TUhE paragraphs /TUhU#/ utterance(20) = (ek | gihek | zihek) # | quantifier /POhO#/ | NA /POhO#/ term ... /VAU#/ | prenex | relative-clauses | links | linkargs | sentence prenex(30) = term ... ZOhU # sentence(40) = bridi-tail | sentence-1 sentence-1(41) = gek sentence-1 gik sentence-1 | prenex sentence | term ... /CU#/ bridi-tail bridi-tail(50) = simple-bridi-tail | mex-relation [term ...] /VAU#/ | gek-bridi-tail | tagged-gek-bridi-tail gek-bridi-tail(51) = gek bridi-tail gik bridi-tail | NA gek-bridi-tail | NA tagged-gek-bridi-tail tagged-gek-bridi-tail(52) = tag KE gek-bridi-tail /KEhE#/ simple-bridi-tail(53) = front-bridi [gihek [stag] KE # simple-bridi-tail /KEhE#/] front-bridi(60) = (selbri | front-bridi gihek # back-bridi) [term ...] /VAU#/ back-bridi(62) = selbri [terms ...] /VAU#/ [gihek [stag] BO # back-bridi] term(81) = sumti | (tag | FA #) (sumti | /KU#/) | termset | NA KU # termset(83) = NUhI [NAhE] gek term ... /NUhU#/ gik term ... /NUhU#/ | NUhI term ... /NUhU#/ ek # term ... /NUhU#/ sumti(90) = sumti-1 [(joik # | ek #) sumti-1] ... sumti-1(91) = sumti-2 [ek [stag] BO # sumti-1] sumti-2(92) = sumti-3 [ek [stag] KE # sumti /KEhE#/] ... sumti-3(93) = [quantifier] sumti-4 | quantifier [quantifier] selbri /KU#/ [relative-clauses] sumti-4(96) = (LAhE | NAhE BO #) sumti-3 | sumti-5 [relative-clauses] | gek sumti gik sumti-3 sumti-5(99) = KOhA # | letteral-string # /BOI/ # | LA CMENE ... # | (LA | LE) sumti-tail /KU#/ | LI mex LOhO | LUhI sumti /LUhU#/ | ZO any-word # | LU text /LIhU/ # | LOhU any-word ... LEhU # | ZOI any-word anything any-word # relative-clauses(110) = relative-clause [zihek # relative-clause] ... relative-clause(111) = GOI term /GEhU#/ | NOI sentence /KUhO#/ sumti-tail(113) = [sumti-4] [quantifier] selbri | quantifier sumti selbri(130) = [tag] selbri-1 selbri-1(131) = (NA [tag]) ... selbri-2 ... [CO selbri-1] selbri-2(133) = selbri-3 [joik-jek selbri-3] ... selbri-3(134) = selbri-4 [BO selbri-3] | [NAhE] guhek selbri gik selbri-3 selbri-4(150) = selbri-5 [CEI selbri-5] ... selbri-5(151) = selbri-6 [linkargs] | NAhE selbri-5 NU [NAI] [joik-jek NU [NAI]] ... sentence /KEI#/ selbri-6(154) = BRIVLA # | GOhA [RAhO] # | (number | letteral-string) MOI # | KE selbri-2 ... /KEhE#/ | ME sumti /MEhU#/ | NUhA mex-operator | SE # selbri-6 linkargs(160) = BE term [links] /BEhO#/ links(161) = BEI term [links] quantifier(300) = number # /BOI/ | VEI [GAhO] mex /VEhO [GAhO] #/ mex-relation(301) = DU | DOhE mex-operator | NA mex-relation | SE # mex-relation mex(310) = mex-1 [operator mex-1] ... | FUhA rp-expression mex-1(311) = mex-2 [BO operator mex-1] mex-2(312) = operand | PEhO operator mex-3 /KUhE#/ mex-3(313) = mex-2 ... | operator mex-3 /KUhE#/ rp-expression(330) = rp-operand rp-operand operator rp-operand(332) = operand | rp-expression operator(370) = operator-1 [joik-jek operator-1] ... operator-1(371) = operator-2 | guhek operator-1 gik operator-2 operator-2(372) = mex-operator # | KE operator /KEhE#/ mex-operator(374) = [SE # | NAhE] ... (VUhU | REhO mex-relation MAhO letteral-string # /BOI/ | NAhU bridi-tail /TEhU/) operand(381) = operand-1 [ek # operand-1] ... operand-1(382) = operand-2 [ek [stag] BO # operand-1] operand-2(383) = [LAhE ...] operand-3 [ek [stag] KE # operand /KEhE#/] ... operand-3(385) = quantifier | letteral-string # /BOI/ | NIhE selbri-6 [linkargs] /TEhU/ | MOhE sumti /TEhU/ | JOhI mex-3 /KUhE#/ | gek operand gik [LAhE ...] operand-3 number(961) = PA [PA | letteral] ... letteral-string(986) = letteral [PA | letteral] ... letteral(987) = BY | A BU | I BU | Y BU | ZAI letteral-string FOI | LAU letteral | TEI letteral letteral ek(802) = [NA] [SE] A [NAI] gihek(818) = [NA] [SE] GIhA [NAI] zihek(820) = [NA] [SE] ZIhA [NAI] jek(805) = [NA] [SE] JA [NAI] joik(806) = [SE] JOI [NAI] | BIhI [GAhO GAhO] joik-jek(422) = joik # | jek # gek(807) = [SE] GA [NAI] # | stag GI [NAI] # guhek(808) = [SE] GUhA [NAI] # gik(816) = GI [NAI] # tag(491) = tense-aspect [joik-jek tense-aspect] ... | CUhE # stag(971) = simple-tense-aspect [(jek | joik) simple-tense-aspect] ... | CUhE tense-aspect(815) = simple-tense-aspect # | FIhO selbri /FEhU/ # simple-tense-aspect(972) = [SE] BAI [NAI] | (tense & CAhA) | NAhE simple-tense-aspect tense(975) = [origin [KI]] time & space [KI] | KI origin(977) = ZEhA & VEhA & VIhA time(1030) = ZI [time-interval] | (time-offset ...) & time-interval time-offset(1033) = PU [NAI] [ZI] time-interval(1034) = (PU [NAI] ZEhA) & interval-modifier space(1040) = space-1 & (MOhI space-offset) space-1(1042) = VA [space-interval] | (space-offset ...) & space-interval space-offset(1045) = FAhA [NAI] [VA] space-interval(1046) = (FAhA [NAI] (VEhA & VIhA)) & FEhE interval-modifier | (FAhA [NAI] (VEhA & VIhA)) interval-modifier interval-modifier(1050) = interval-property [(ZAhO [interval-property]) ...] interval-property(1051) = number ROI [NAI] | TAhE [NAI] free(32) = SEI # [term ... /CU#/] selbri /SEhU/ | SOI sumti [sumti] /SEhU/ | TIhO mex-operator quantifier /SEhU/ | TIhO mex-operator mex-relation mex-operator /SEhU/ | vocative selbri [relative-clauses] /DOhU/ | vocative CMENE ... # [relative-clauses] /DOhU/ | vocative [sumti] /DOhU/ | TO text /TOI/ | XI (number | letteral-string) # /BOI/ vocative(415) = (COI [NAI]) ... & DOI indicators(801) = [FUhE] indicator ... indicator(907) = (number | letteral-string) MAI | (UI | Y | CAI) [NAI] The following rules are non-formal: any-word(1100) = [BAhE] any-word [indicators] anything = "any text at all, whether Lojban or not" null(1101) = any-word SI | utterance SA | text SU | POhA | PEhA | DAhO | FUhO -- cowan@marob.masa.com (aka ...!hombre!marob!cowan) e'osai ko sarji la lojban