Jorge Llambías, On 20/01/2015 19:38:
On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com <mailto:and.rosta@gmail.com>> wrote:
On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjllambias@gmail.com <mailto:jjllambias@gmail.com>> wrote:
Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:
(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
Rather:
(1') convert the input into a string [or perhaps tree] of phonemes
(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
You seem to have just merged (2) and (3) into (2'),
No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').
which may be more general, but in the particular case of Lojban we
know that (2') can be achieved in two independent steps, one step
that takes any string of phonemes and unambiguously dissects it into
a string of words (possibly including non-lojban words),
yes
and a second step that takes the resulting string of words as input
and unambiguously gives a unique tree structure for them (or else
rejects the string of words as ungrammatical).
No. The second step (my (3')) takes the string of phonological words but it doesn't give a *syntactic* tree structure whose terminal nodes are phonological words, which is what I take "gives a tree structure for them" to mean. Not every syntactic node need correspond to a phonological one (e.g. ellipsis, which Lojban uses) and a phonological word can correspond to more than one syntactic one (e.g. English _you're_ is one phonological word corresponding to a sequence of a pronoun and an auxiliary).
Rather, step (3') uses the rules that define correspondences between elements of the sentence's phonology and elements of the sentence's syntax, to find a sentence syntax that -- in Lojban's case, uniquely -- licitly corresponds to the sentence's phonology.
Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.
> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.
Right. So I think (3) is not a valid step.
But why is it invalid if it achieves the desired result?
It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.
The current PEG doesn't produce binary branching exclusively,
although it can probably be tweaked to do that by adding many
intermediate rules. Why is unary branching bad?
Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.
The first rule means that a "statement" node can unary branch into a "statement-1" node, or binary branch into "prenex" and "statement" nodes. The PEG could instead just be:
statement <- statement-2 (I-clause joik-jek statement-2?)* / prenex statement
and completely bypass the statement-1 node, which is indeed superfluous. The PEG can probably be re-written so as to eliminate all unary branching, although there may be a price in clarity.
There are many rules where one of the branches is optional, so that
would result either in an empty leaf or a unary branch.
Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)
Would you want binary branching all the way down to phonemes, or just
to words?
Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes.
I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.