Received-SPF: pass (google.com: domain of and.rosta@gmail.com designates 2a00:1450:400c:c00::22f as permitted sender) client-ip=2a00:1450:400c:c00::22f;
Message-ID: <54BEE656.9090807@gmail.com>
Date: Tue, 20 Jan 2015 23:35:50 +0000
From: And Rosta <and.rosta@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120711 Thunderbird/14.0
MIME-Version: 1.0
To: lojban@googlegroups.com
Subject: Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating
 a few jbovlaste entries
References: <0CD5A578A47549238B8B046A01B8846C@gmail.com> <CAP=UV6rFA6e3bF=mDFBoDSKqo_kxMsh5xWuVbpSe94VhL2Li0Q@mail.gmail.com> <54BCF147.1080803@lojban.org> <54BCFC70.2010805@selpahi.de> <CAO1AUJPR_A0MtGUTjdTV0mYcFNY6sY12Ln0z3RPTYygdZEOodA@mail.gmail.com> <D014ACDCB5974C32A7C1D3D011A0E389@gmail.com> <CAO1AUJPBgGc44EPyuXRUvDVg=EoiGAsTENjqz=6h=Hqy6b3dMg@mail.gmail.com> <CACf3dPnND-3i=8U2y_ExH7dZZbMOFYhbvR+2o0_N+yG64m7vqw@mail.gmail.com> <CAO1AUJMKYQxqqA+ws2cjumSCyDPdFb92OH8XSWqx1v7wL8aE4Q@mail.gmail.com> <54BE4E4F.1060204@gmail.com> <CAO7tK2eRSmGaoF5a3jujdxYzrC8f0X6smpQedAxofyqR7BjqxA@mail.gmail.com> <CACf3dPke=nrnNuz8jAb6Oph+QOsec3LiffVCnejKwXYTSCnVpQ@mail.gmail.com> <CAO7tK2c9tSTb6PnKgqZpTRdqP7Zu07A7s-vJP6QA8GWWObjJKQ@mail.gmail.com>
In-Reply-To: <CAO7tK2c9tSTb6PnKgqZpTRdqP7Zu07A7s-vJP6QA8GWWObjJKQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
X-Spam_score: 0.8
X-Spam_score_int: 8
X-Spam_bar: /
X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  Jorge Llambías, On 20/01/2015 19:38: > On Tue, Jan 20, 2015
    at 3:28 PM, And Rosta <and.rosta@gmail.com <mailto:and.rosta@gmail.com>>
   wrote: > > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjllambias@gmail.com
    <mailto:jjllambias@gmail.com>> wrote: > > > Would it be fair to say that
   what an actual grammar should do is, given some input of sound or written
   characters, tell us how to: > > (1) convert the input into a string of phonemes
    > (2) convert the string of phonemes into a string of words > (3) determine
    a tree structure for the string of words > (4) determine which nodes of the
    tree are terms, which nodes are predicates, which terms are co-referring,
    and which terms are arguments of which predicates > > > Rather: > > (1')
   convert the input into a string [or perhaps tree] of phonemes > (2') convert
    the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic)
    tree] of phonological words > (3') map the tree of phonological words to
   a structure of syntactic 'words'/'nodes', which structure will specify which
    nodes of the tree are terms, which nodes are predicates, which terms are
   co-referring, and which terms are arguments of which predicates > > > You
   seem to have just merged (2) and (3) into (2'), [...] 
 
 Content analysis details:   (0.8 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was blocked.
                             See
                             http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                              for more information.
                             [URIs: googlegroups.com]
  2.7 DNS_FROM_AHBL_RHSBL    RBL: Envelope sender listed in dnsbl.ahbl.org
                             [listed in googlegroups.com.rhsbl.ahbl.org.	IN]
                             [A]
 -0.0 RCVD_IN_MSPIKE_H3      RBL: Good reputation (+3)
                             [74.125.83.56 listed in wl.mailspike.net]
  0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
                             domains are different
 -0.0 SPF_PASS               SPF: sender matches SPF record
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                             (and.rosta[at]gmail.com)
  0.0 DKIM_ADSP_CUSTOM_MED   No valid author signature, adsp_override is
                             CUSTOM_MED
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
 -0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
  0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily valid
  0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and
                             EnvelopeFrom freemail headers are different
 -0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders

Jorge Llamb=C3=ADas, On 20/01/2015 19:38:
> On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com <mailto:a=
nd.rosta@gmail.com>> wrote:
>
>     On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llamb=C3=ADas <jjllambias@gmai=
l.com <mailto:jjllambias@gmail.com>> wrote:
>
>
>         Would it be fair to say that what an actual grammar should do is,=
 given some input of sound or written characters, tell us how to:
>
>         (1) convert the input into a string of phonemes
>         (2) convert the string of phonemes into a string of words
>         (3) determine a tree structure for the string of words
>         (4) determine which nodes of the tree are terms, which nodes are =
predicates, which terms are co-referring, and which terms are arguments of =
which predicates
>
>
>     Rather:
>
>     (1') convert the input into a string [or perhaps tree] of phonemes
>     (2') convert the string [or perhaps tree] of phonemes into a string [=
or perhaps (prosodic) tree] of phonological words
>     (3') map the tree of phonological words to a structure of syntactic '=
words'/'nodes', which structure will specify which nodes of the tree are te=
rms, which nodes are predicates, which terms are co-referring, and which te=
rms are arguments of which predicates
>
>
> You seem to have just merged (2) and (3) into (2'),

No, I meant (2') to be just a restatement of (2), with the added acknowledg=
ement that in human languages there is tree-like phonological structure abo=
ve the word level -- i.e. prosodic phonology, which yields intonation phras=
es and so forth. (Google "prosodic phonology", but don't get sidetracked, b=
ecause it's orthogonal to my point.) I phrased it hedgily because of course=
 the formal definition of Lojban delibrately eschews phonological structure=
 beyond mere phoneme strings. But there is nothing of (3) in (2').

> which may be more general, but in the particular case of Lojban we
> know that (2') can be achieved in two independent steps, one step
> that takes any string of phonemes and unambiguously dissects it into
> a string of words (possibly including non-lojban words),

yes

> and a second step that takes the resulting string of words as input
> and unambiguously gives a unique tree structure for them (or else
> rejects the string of words as ungrammatical).

No. The second step (my (3')) takes the string of phonological words but it=
 doesn't give a *syntactic* tree structure whose terminal nodes are phonolo=
gical words, which is what I take "gives a tree structure for them" to mean=
. Not every syntactic node need correspond to a phonological one (e.g. elli=
psis, which Lojban uses) and a phonological word can correspond to more tha=
n one syntactic one (e.g. English _you're_ is one phonological word corresp=
onding to a sequence of a pronoun and an auxiliary). Rather, step (3') uses=
 the rules that define correspondences between elements of the sentence's p=
honology and elements of the sentence's syntax, to find a sentence syntax t=
hat -- in Lojban's case, uniquely -- licitly corresponds to the sentence's =
phonology.

Step (3') yields something like Tersmu output, probably augmented by some p=
urely syntactic (i.e. without logical import) structure. I think that can a=
nd should be done without reference to the formal grammars.

>     > If that's more or less on track, then we can say that the YACC/EBNF=
 formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu =
is trying to do (4). I would agree that the way our formal grammars do (3) =
is probably not much like the way our brains do (3), but I'm not sure I see=
 what alternative we have.
>
>     Right. So I think (3) is not a valid step.
>
> But why is it invalid if it achieves the desired result?

It just doesn't yield a human language. And to the (considerable) extent to=
 which Lojban counts as a human language, it is working despite (3) rather =
than because of it.


> And what's the alternative, how else could we formalize (2')?

I think I hadn't succeeded in making you t understood what I'd meant by (2'=
).

>     (3') should be doable, partly from Tersmu and partly by using some na=
tural language formalism to analyse the syntax (e.g. at minimum make all ph=
rases headed and forbid unary branching; binary branching would be a bonus =
if it could be managed).
>
> In order to do (3'), we first need to do (2'). PEG does (2') (and so
> does Yacc+its preparser, with some limitations). And the resulting
> tree has enough detail (in the labeling of its nodes) to give us a
> head start with (3'). I assume Tersmu uses the output of one of these
> as its input.

I hope I've now explained where the misstep is, and how the product (i.e. t=
he supposed operation/definition of the grammar) is something that isn't a =
human language.

> The current PEG doesn't produce binary branching exclusively,
> although it can probably be tweaked to do that by adding many
> intermediate rules. Why is unary branching bad?

Human languages seem not to avail themselves of it; unary branching constit=
utes a superfluous richness of structural possibilities.

> There are many rules where one of the branches is optional, so that
> would result either in an empty leaf or a unary branch.

Say you've got an optionally transitive/intransitive verb, such as English =
_swallow_. When it has an object, they jointly form a binary branching phra=
se. When it lacks an object, then there is no need for any branching; so fo=
r example _I swallow_ could be a binary phrase whose constituents do not th=
emselves branch. (It's true that many models of syntax do allow unary branc=
hing precisely when the daughter node is terminal, so rather than argue ove=
r that, let me instead say that it's unary branching with a nonterminal nod=
e that is superfluous.)

> Would you want binary branching all the way down to phonemes, or just
> to words?

Syntactic words and phonemes don't exist on the same plane; phonemes don't =
comprise syntactic words; syntactic words don't consist of phonemes. I thin=
k binary branching in syntax has many virtues, and I believe natlang syntax=
 is binary branching (-- English for sure; other languages - probably), but=
 it's not the case that all right-minded linguisticians share that view. I =
myself don't think that phonological structure above or below the word leve=
l is binary branching, but others do; either way, the nature of phonologica=
l structure is not really germane.


--And.

--=20
You received this message because you are subscribed to the Google Groups "=
lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.