[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries
Jorge Llambías, On 21/01/2015 12:33:
On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and.rosta@gmail.com <mailto:and.rosta@gmail.com>> wrote:
Jorge Llambías, On 20/01/2015 19:38:
On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com <mailto:and.rosta@gmail.com> <mailto:and.rosta@gmail.com <mailto:and.rosta@gmail.com>>> wrote:
On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjllambias@gmail.com <mailto:jjllambias@gmail.com> <mailto:jjllambias@gmail.com <mailto:jjllambias@gmail.com>>> wrote:
Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:
(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
Rather:
(1') convert the input into a string [or perhaps tree] of phonemes
(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
You seem to have just merged (2) and (3) into (2'),
No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').
Ok, I see. Then my (3) and (4) are merged into your (3'), with the
proviso that you think (3) is either useless or possibly detrimental
to achieving (3').
Yes.
BTW, don't the C's and V's of the traditional definition give some
phonological structure beyond mere phoneme strings? The PEG
morphology also makes use of syllables and their onset-nucleus-coda
components. That's phonological structure, right?
Yes, but I am conscious of being among people more mathematically-minded than I am, so I shrink from attempting to pronounce on what sort of structure goes beyond mere patterning in a string. At any rate, yes the traditional definition does impose some phonological structure; but whether that is hierarchical rather than linear, I am uncertain.
Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.
But Tersmu output is basically FOPL, which has its own formal grammar
(on which Lojban's formal grammar is based). I still don't see what
problems formal grammars create.
(3') must certainly involve a grammar, and I can't think of any sense in which a grammar could meaningfully be called 'informal', so I'm happy to call that grammar 'formal'. But it differs from the CS (or at least the Lojban) notion primarily in not having phonological objects as any of its nodes and secondarily in not necessarily being simply a labelled bracketing of a string.
> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.
Right. So I think (3) is not a valid step.
But why is it invalid if it achieves the desired result?
It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.
I can accept that, or perhaps "regardless of (3)", but I agree not "because of (3)". But I'm not sure there's much left of Lojban if we remove (3).
To the extent that Lojban is a language, (3) doesn't really constitute any part of Lojban (despite the mistaken belief of many Lojbanists to the contrary). Also, to the extent that Lojban is a language, there exists an implicit version of (3'), albeit not necessarily one that is coherent or unambiguous. So I would recommend removing the current Formal Grammars from the definition of Lojban, and replacing them by one -- an explicit (3') -- that more credibly represents actual human language (but is unambiguous etc.).
The current PEG doesn't produce binary branching exclusively,
although it can probably be tweaked to do that by adding many
intermediate rules. Why is unary branching bad?
Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.
Ok. As an example, the PEG has:
statement <- statement-1 / prenex statement
statement-1 <- statement-2 (I-clause joik-jek statement-2?)*
The first rule means that a "statement" node can unary branch into a "statement-1" node, or binary branch into "prenex" and "statement" nodes. The PEG could instead just be:
statement <-statement-2 (I-clause joik-jek statement-2?)* / prenex statement
and completely bypass the statement-1 node, which is indeed superfluous.The PEG can probably be re-written so as to eliminate all unary branching, although there may be a price in clarity.
Good. Also questionable is the extent to which a nonterminal node can have properties/labels not simply derived from the label of the head daughter: the range of views among syntacticians is too hard to summarize in one sentence here, but certainly one does not come across syntactic trees for natlang sentences with a pattern of labellings resembling Lojban's, i.e. where the relationship between labels on the mother and the daughters is unconstrained.
There are many rules where one of the branches is optional, so that
would result either in an empty leaf or a unary branch.
Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)
OK, but is this more than just aesthetics? Unary branches don't do
anything useful, but are they harmful other than in cluttering the
tree with superfluous nodes?
They're harmless clutter if there's no contrast with a version of the tree where mother and singleton daughter merge into the same node. You need to consider the branching issue together with the labelling issue. If mother and head-daughter have the same label, then the redundancy of unary branching is plain.
Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes.
Ok, but in Lojban there's almost a one-to-one match between
phonological and syntactic words.
That remains to be seen, because there isn't yet an explicit real syntax for Lojban. However, it's perfectly possible that in Lojban, phonology--syntax mismatches are rare.
I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.
When you say something like "I believe natlang syntax is binary
branching" I realize we have a different idea about what syntax is,
because I can't have any beliefs one way or the other on whether
natlang syntax is binary branching or not.Let me try to explain with
a simple Lojban example.
I'm not sure if choosing a simple Lojban example is going to reveal why you can't have beliefs about binary branching in natlangs. Syntax is a set of rules for combining the combinatorial units of syntax in ways that are combinatorially licit and that combine the units' phonological forms and their meanings. I suspect (but excuse me if I'm mistaken) that for you every set of rules that defines the correct set of sentences is equally valid, so that so long as the rules match the right sentence sounds to the right sentence meanings, it doesn't matter what the intermediate structure is like; if the syntactician has a job, it is to work out *a* set of rules, but there is no reason to think there is only one correct set of rules. In contrast, pretty much all linguisticians think (but not always for the same reasons) that of the sets of rules that generate the same, correct, set of sentences, some of those sets are right and some are wrong or at least some are righter and some are wronger
. In my case I think the rules matter because (i) to understand the system you need to understand its internal mechanics, and (ii) a speaker knows a certain set of rules. and it's known-rules that are my object of study.
One could posit several different syntactic structures for the sumti
"lo broda ku":
(1) (lo broda)- -ku
(2) lo- -(broda ku)
(3) (lo- -ku) -broda-
(4) lo- -broda- -ku
For me they are all defensible. (1) probably reflects best how "ku" was born, a "spoken comma", something that separates the fully formed sumti "lo broda" from the rest of the sentence. (2) may reflect best my psychological introspective understanding of "ku" as a terminator of the sumti-tail. (3) reflects a popular take where lo...ku are brackets around a selbri that convert it into a sumti, and (4) happens to best match what PEG, YACC and BNF do, since they give a node with three branches.
If I understand you correctly, only one of those four could correctly
reflect Lojban syntax, whereas for me all four are equally valid
takes since in the end it makes no difference which one we choose.
Now in the case of Lojban we could say that only one of these is the
officially correct syntax (currently that would be 4), but if
something like that happens in natlangs, does it make sense to talk
of "the syntax" for the natlang as opposed to "a syntax"?
Note that I want to distinguish between "ideas that are obviously wrong" and "ideas I don't agree with"; the main points I wanted to make in this thread pertain to the former sort, whereas my objection to non-binary-branching is of only the latter sort. But anyway, with that caveat declared, I'd say (on the basis of my tentative belief about the binarity of branching) that that (4) is invalid because there is no mechanism for building it, and (3) is, absent any additional syntactic structure, also invalid because there is no way to generate the right order of phonological words from that syntactic structure. It's unlikely that the arguments for (1) and (2) are equally strong, but still it's possible that the grammar allows both structures or that there are multiple parallel equally viable grammars. FWIW I, who was never much of a Lojban syntactician, think (1) looks to be better than (2).
--And.
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.