Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries

On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and.rosta@gmail.com> wrote:

Jorge Llambías, On 20/01/2015 19:38:

On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com <mailto:and.rosta@gmail.com>> wrote:
On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjllambias@gmail.com <mailto:jjllambias@gmail.com>> wrote:

Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:

(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

Rather:

(1') convert the input into a string [or perhaps tree] of phonemes
(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

You seem to have just merged (2) and (3) into (2'),

No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').

Ok, I see. Then my (3) and (4) are merged into your (3'), with the proviso that you think (3) is either useless or possibly detrimental to achieving (3').

BTW, don't the C's and V's of the traditional definition give some phonological structure beyond mere phoneme strings? The PEG morphology also makes use of syllables and their onset-nucleus-coda components. That's phonological structure, right?

which may be more general, but in the particular case of Lojban we
know that (2') can be achieved in two independent steps, one step
that takes any string of phonemes and unambiguously dissects it into
a string of words (possibly including non-lojban words),

yes

and a second step that takes the resulting string of words as input
and unambiguously gives a unique tree structure for them (or else
rejects the string of words as ungrammatical).

No. The second step (my (3')) takes the string of phonological words but it doesn't give a *syntactic* tree structure whose terminal nodes are phonological words, which is what I take "gives a tree structure for them" to mean. Not every syntactic node need correspond to a phonological one (e.g. ellipsis, which Lojban uses) and a phonological word can correspond to more than one syntactic one (e.g. English _you're_ is one phonological word corresponding to a sequence of a pronoun and an auxiliary).

In Lojban we could say that the reverse happens with "ybu", which would be one syntactic word consisting of two phonological words (defining syntactic word as those that can be quoted with "zo" according to PEG).

Rather, step (3') uses the rules that define correspondences between elements of the sentence's phonology and elements of the sentence's syntax, to find a sentence syntax that -- in Lojban's case, uniquely -- licitly corresponds to the sentence's phonology.

Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.

But Tersmu output is basically FOPL, which has its own formal grammar (on which Lojban's formal grammar is based). I still don't see what problems formal grammars create.

> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.

Right. So I think (3) is not a valid step.

But why is it invalid if it achieves the desired result?

It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.

I can accept that, or perhaps "regardless of (3)", but I agree not "because of (3)". But I'm not sure there's much left of Lojban if we remove (3).

The current PEG doesn't produce binary branching exclusively,
although it can probably be tweaked to do that by adding many
intermediate rules. Why is unary branching bad?

Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.

Ok. As an example, the PEG has:

statement <- statement-1 / prenex statement

statement-1 <- statement-2 (I-clause joik-jek statement-2?)*

The first rule means that a "statement" node can unary branch into a "statement-1" node, or binary branch into "prenex" and "statement" nodes. The PEG could instead just be:

   statement <- statement-2 (I-clause joik-jek statement-2?)* / prenex statement

and completely bypass the statement-1 node, which is indeed superfluous. The PEG can probably be re-written so as to eliminate all unary branching, although there may be a price in clarity.

There are many rules where one of the branches is optional, so that
would result either in an empty leaf or a unary branch.

Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)

OK, but is this more than just aesthetics? Unary branches don't do anything useful, but are they harmful other than in cluttering the tree with superfluous nodes? I'm probably asking the wrong questions anyway, because I'm not yet capable of identifying the problem.

Would you want binary branching all the way down to phonemes, or just
to words?

Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes.

Ok, but in Lojban there's almost a one-to-one match between phonological and syntactic words.

I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.

When you say something like "I believe natlang syntax is binary branching" I realize we have a different idea about what syntax is, because I can't have any beliefs one way or the other on whether natlang syntax is binary branching or not. Let me try to explain with a simple Lojban example. One could posit several different syntactic structures for the sumti "lo broda ku":

(1) (lo broda)- -ku

(2) lo- -(broda ku)

(3) (lo- -ku) -broda-

(4) lo- -broda- -ku

For me they are all defensible. (1) probably reflects best how "ku" was born, a "spoken comma", something that separates the fully formed sumti "lo broda" from the rest of the sentence. (2) may reflect best my psychological introspective understanding of "ku" as a terminator of the sumti-tail. (3) reflects a popular take where lo...ku are brackets around a selbri that convert it into a sumti, and (4) happens to best match what PEG, YACC and BNF do, since they give a node with three branches.

If I understand you correctly, only one of those four could correctly reflect Lojban syntax, whereas for me all four are equally valid takes since in the end it makes no difference which one we choose. Now in the case of Lojban we could say that only one of these is the officially correct syntax (currently that would be 4), but if something like that happens in natlangs, does it make sense to talk of "the syntax" for the natlang as opposed to "a syntax"?

mu'o mi'e xorxes