Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries

On Wed, Feb 4, 2015 at 12:45 PM, And Rosta <and.rosta@gmail.com> wrote:

But starting to tackle (3') is not so daunting:
Step 1: What is the least clunky way of getting unambiguously from
phonological words to logical form -- from the phonological words of
Lojban sentences to the logical forms of Lojban sentences (with the
notion of Lojban sentence defined by usage or consensus)? Any
loglanger could have a stab at tackling this.

The least clunky (and only) way we have today to do this is parsers+Tersmu.

Step 2: Identify any devices that are absent from natlangs.
Step 3: Redo Step 1, without using devices identified in Step 2.

We have done some of Step 2 by way of reforming our parsing grammars, albeit mostly unofficially for now, though usually the motivation is not explicitly so much that the devices are absent from natlangs but that we dislike them (unfortunately there's also sometimes a tendency to add to the weirdness, but we can hope that common sense will prevail in the end).

Reflecting on this further, during the couple of weeks it's taken for
me to find the time to finish this reply, I would suggest that
*official*, *definitional* specification of the grammar consist only
of a set of sentences defined as pairings of phonological and logical
forms (ideally, consistent with the 'monoparsing' precept that to
every phonological form there must correspond no more than one logical
form).

But how do we identify those sentences if not through some generating algorithm? Or do you mean just a finite list of sample sentences, in which case, where do we get them from?

Then, any rule set that generates that set of pairings would be
deemed to count as a valid grammar of Lojban, and then from among the
valid grammars we could seek the one(s) that are closest to those
internalized by human speakers.

Would it have to be a rule set that generates that set of pairings and only that set, or could it also generate new sentences? I'm not clear on whether you mean the initial set to be a finite sample from which to generalize, or the complete language.

We currently don't have a clear idea of what syntactic words Lojban
has, where by "syntactic word" I mean ingredients of logicosyntactic
form, the form that encodes logical structure. Some phonological words
seem to correspond to chunks of logical structure rather than single
nodes, and there will be instances of nodes in logical structure that
don't correspond to anything in phonology (-- the most obvious example
is ellipsis, which Lojban sensibly makes heavy use of).

Could you give an example of a phonological word that would correspond to a chunk of logical structure? Do you mean something like "pe" possibly being logically equivalent to "poi ke'a co'e" for example? Would that mean that "pe" does not correspond to a syntactic word?

I don't see a problem in considering the empty phonological string as corresponding to a syntactic word, and in fact some of the parsers do exactly that in dealing with terminators. (Not sure if any parser does that yet in dealing with "zo'e", but then current parsers don't know the number of arguments that a predicate has.)

> What I meant to say is that I can't see a syntax as an intrinsic feature of a natlang, as opposed to being just a model, which can be a better or worse fit, but it can never be the language.

Are holding for natlangs the view that I propose above for Lojban,
namely that a language is a set of sentences, i.e. form--meaning
correspondences, and although in practice there must be some system
for generating that set, it doesn't matter what the system is, so long
as it generates the right set, and therefore in that sense the system
is not intrinsic to language?

If Yes, I don't agree, but I think the position is coherent enough
that I won't try to dissuade you from it.

If not, do explain again what you mean.

I don't think a natlang can be a set of sentences because a set is much too precise an object to accurately describe a natlang, which would have to be fuzzy. In any case, I don't know what a natlang is, but I do think that a syntactic theory can only be a model for it and not it.

> So I can accept that binary branching syntaxes are more elegant, more perspicuous, etc, I just can't believe they are a feature of the language, just like the description of a house is not a feature of the house. Maybe that's just me not being a linguist.

But could a description of an architectural plan of a house be an
architectural plan of a house? Could a comprehensive explcit
description of a code be a code? Surely yes, and the same for
language.

Certainly, but there could be two different adequate architectural plans of the same house.

I don't know how suitable PEG/YACC/BNF are for natlangs. I must
ruefully confess I know nothing about PEG, despite all the work you've
done with it. AFAIK linguists in the last half century haven't found
BNF necessary or sufficient for their rules, but my meagre knowledge
doesn't extend to knowing the mathematical properties of BNF and other
actually used formalisms, and the relationships between them.

PEG is basically equivalent to BNF for present purposes, it's just an algorithm for providing a tree structure to a string of terminals. One nice thing about it is that PEGs are necessarily unambiguous, basically by prioritizing the rules that BNF gives unprioritized.

In denouncing the suitability of PEG/YACC/BNF, I was really meaning to
denounce treating phonological stuff (e.g. phonological words) as
constituents of terminal nodes in syntactic structures. You said that
terminal nodes are actually selmaho and (iirc?) that the 1--1
correspondence between phonological words and selmaho terminal nodes
is not essential.

The 1-1 correspondence would be between classes of phonological words and selmaho, since for example "mi" and "do" are two phonological words belonging to the same selmaho KOhA. The correspondence between phonological words and selmaho is irrelevant from the point of view of the "syntax" (in scare quotes), which doesn't care at all about phonological form. The "syntax" only works with selmaho. (Perhaps the ZOI delimiter is an exception to this, since the "syntax" has to identify the final delimiter as being the same phonological word as the initial delimiter.)

So in that case my objection would not be to CS
grammars per se but only to the idea that a CS grammar can model a
whole grammar rather than just, say, the combinatorics of syntax. So I
reserve judgement on PEG et al: if they can represent logicosyntactic
structure in full, then they have my blessing.

They can only model the combinatorics and parse trees, they can't model things like co-referentiality.

mu'o mi'e xorxes