Received-SPF: pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c00::22c as permitted sender) client-ip=2a00:1450:400c:c00::22c;
MIME-Version: 1.0
In-Reply-To: <CACf3dPkp993PNUQrKGgMONLNFHu+4k31zkohmSkeMugyJLzOVQ@mail.gmail.com>
References: <0CD5A578A47549238B8B046A01B8846C@gmail.com>
	<CAP=UV6rFA6e3bF=mDFBoDSKqo_kxMsh5xWuVbpSe94VhL2Li0Q@mail.gmail.com>
	<54BCF147.1080803@lojban.org>
	<54BCFC70.2010805@selpahi.de>
	<CAO1AUJPR_A0MtGUTjdTV0mYcFNY6sY12Ln0z3RPTYygdZEOodA@mail.gmail.com>
	<D014ACDCB5974C32A7C1D3D011A0E389@gmail.com>
	<CAO1AUJPBgGc44EPyuXRUvDVg=EoiGAsTENjqz=6h=Hqy6b3dMg@mail.gmail.com>
	<CACf3dPnND-3i=8U2y_ExH7dZZbMOFYhbvR+2o0_N+yG64m7vqw@mail.gmail.com>
	<CAO1AUJMKYQxqqA+ws2cjumSCyDPdFb92OH8XSWqx1v7wL8aE4Q@mail.gmail.com>
	<54BE4E4F.1060204@gmail.com>
	<CAO7tK2eRSmGaoF5a3jujdxYzrC8f0X6smpQedAxofyqR7BjqxA@mail.gmail.com>
	<CACf3dPke=nrnNuz8jAb6Oph+QOsec3LiffVCnejKwXYTSCnVpQ@mail.gmail.com>
	<CAO7tK2c9tSTb6PnKgqZpTRdqP7Zu07A7s-vJP6QA8GWWObjJKQ@mail.gmail.com>
	<54BEE656.9090807@gmail.com>
	<CAO7tK2er6zTbCJUJ+OBL2mfAg_Ziio6_TfPYgRxRSfWbVVcc_A@mail.gmail.com>
	<54BFC0F4.1010600@gmail.com>
	<CAO7tK2c29phPzkuuWOezvkbS6O_4yrnC2B=kwWZUZDzfyapzGQ@mail.gmail.com>
	<CACf3dPkp993PNUQrKGgMONLNFHu+4k31zkohmSkeMugyJLzOVQ@mail.gmail.com>
Date: Wed, 4 Feb 2015 19:05:38 -0300
Message-ID: <CAO7tK2dmj9a_eB22JcvD=OTW9EieGD1zbfzTASSZ6tB=QEK7bw@mail.gmail.com>
Subject: Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating
 a few jbovlaste entries
From: =?UTF-8?Q?Jorge_Llamb=C3=ADas?= <jjllambias@gmail.com>
To: lojban@googlegroups.com
Content-Type: multipart/alternative; boundary=f46d04440490413db9050e4a63da
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
X-Spam_score: 0.8
X-Spam_score_int: 8
X-Spam_bar: /
X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  On Wed, Feb 4, 2015 at 12:45 PM, And Rosta <and.rosta@gmail.com>
    wrote: > > But starting to tackle (3') is not so daunting: > Step 1: What
    is the least clunky way of getting unambiguously from > phonological words
    to logical form -- from the phonological words of > Lojban sentences to the
    logical forms of Lojban sentences (with the > notion of Lojban sentence defined
    by usage or consensus)? Any > loglanger could have a stab at tackling this.
    > [...] 
 
 Content analysis details:   (0.8 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was blocked.
                             See
                             http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                              for more information.
                             [URIs: googlegroups.com]
  2.7 DNS_FROM_AHBL_RHSBL    RBL: Envelope sender listed in dnsbl.ahbl.org
                             [listed in googlegroups.com.rhsbl.ahbl.org.	IN]
                             [A]
 -0.0 RCVD_IN_MSPIKE_H3      RBL: Good reputation (+3)
                             [209.85.217.184 listed in wl.mailspike.net]
  0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
                             domains are different
 -0.0 SPF_PASS               SPF: sender matches SPF record
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                             (jjllambias[at]gmail.com)
  0.0 DKIM_ADSP_CUSTOM_MED   No valid author signature, adsp_override is
                             CUSTOM_MED
  0.0 HTML_MESSAGE           BODY: HTML included in message
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
 -0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
  0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily valid
  0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and
                             EnvelopeFrom freemail headers are different
 -0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders

--f46d04440490413db9050e4a63da
Content-Type: text/plain; charset=UTF-8

On Wed, Feb 4, 2015 at 12:45 PM, And Rosta <and.rosta@gmail.com> wrote:

>
> But starting to tackle (3') is not so daunting:
> Step 1: What is the least clunky way of getting unambiguously from
> phonological words to logical form -- from the phonological words of
> Lojban sentences to the logical forms of Lojban sentences (with the
> notion of Lojban sentence defined by usage or consensus)? Any
> loglanger could have a stab at tackling this.
>

The least clunky (and only) way we have today to do this is parsers+Tersmu.


> Step 2: Identify any devices that are absent from natlangs.
> Step 3: Redo Step 1, without using devices identified in Step 2.
>

We have done some of Step 2 by way of reforming our parsing grammars,
albeit mostly unofficially for now, though usually the motivation is not
explicitly so much that the devices are absent from natlangs but that we
dislike them (unfortunately there's also sometimes a tendency to add to the
weirdness, but we can hope that common sense will prevail in the end).


> Reflecting on this further, during the couple of weeks it's taken for
> me to find the time to finish this reply, I would suggest that
> *official*, *definitional* specification of the grammar consist only
> of a set of sentences defined as pairings of phonological and logical
> forms (ideally, consistent with the 'monoparsing' precept that to
> every phonological form there must correspond no more than one logical
> form).


But how do we identify those sentences if not through some generating
algorithm? Or do you mean just a finite list of sample sentences, in which
case, where do we get them from?


> Then, any rule set that generates that set of pairings would be
> deemed to count as a valid grammar of Lojban, and then from among the
> valid grammars we could seek the one(s) that are closest to those
> internalized by human speakers.
>

Would it have to be a rule set that generates that set of pairings and only
that set, or could it also generate new sentences? I'm not clear on whether
you mean the initial set to be a finite sample from which to generalize, or
the complete language.


> We currently don't have a clear idea of what syntactic words Lojban
> has, where by "syntactic word" I mean ingredients of logicosyntactic
> form, the form that encodes logical structure. Some phonological words
> seem to correspond to chunks of logical structure rather than single
> nodes, and there will be instances of nodes in logical structure that
> don't correspond to anything in phonology (-- the most obvious example
> is ellipsis, which Lojban sensibly makes heavy use of).
>

Could you give an example of a phonological word that would correspond to a
chunk of logical structure? Do you mean something like "pe" possibly being
logically equivalent to "poi ke'a co'e" for example?  Would that mean that
"pe" does not correspond to a syntactic word?

I don't see a problem in considering the empty phonological string as
corresponding to a syntactic word, and in fact some of the parsers do
exactly that in dealing with terminators. (Not sure if any parser does that
yet in dealing with "zo'e", but then current parsers don't know the number
of arguments that a predicate has.)

> What I meant to say is that I can't see a syntax as an intrinsic feature
> of a natlang, as opposed to being just a model, which can be a better or
> worse fit, but it can never be the language.
>
> Are holding for natlangs the view that I propose above for Lojban,
> namely that a language is a set of sentences, i.e. form--meaning
> correspondences, and although in practice there must be some system
> for generating that set, it doesn't matter what the system is, so long
> as it generates the right set, and therefore in that sense the system
> is not intrinsic to language?
>
> If Yes, I don't agree, but I think the position is coherent enough
> that I won't try to dissuade you from it.


> If not, do explain again what you mean.
>

I don't think a natlang can be a set of sentences because a set is much too
precise an object to accurately describe a natlang, which would have to be
fuzzy. In any case, I don't know what a natlang is, but I do think that a
syntactic theory can only be a model for it and not it.

> So I can accept that binary branching syntaxes are more elegant, more
> perspicuous, etc, I just can't believe they are a feature of the language,
> just like the description of a house is not a feature of the house. Maybe
> that's just me not being a linguist.
>
> But could a description of an architectural plan of a house be an
> architectural plan of a house? Could a comprehensive explcit
> description of a code be a code? Surely yes, and the same for
> language.
>

Certainly, but there could be two different adequate architectural plans of
the same house.

I don't know how suitable PEG/YACC/BNF are for natlangs. I must
> ruefully confess I know nothing about PEG, despite all the work you've
> done with it. AFAIK linguists in the last half century haven't found
> BNF necessary or sufficient for their rules, but my meagre knowledge
> doesn't extend to knowing the mathematical properties of BNF and other
> actually used formalisms, and the relationships between them.
>

PEG is basically equivalent to BNF for present purposes, it's just an
algorithm for providing a tree structure to a string of terminals. One nice
thing about it is that PEGs are necessarily unambiguous, basically by
prioritizing the rules that BNF gives unprioritized.

In denouncing the suitability of PEG/YACC/BNF, I was really meaning to
> denounce treating phonological stuff (e.g. phonological words) as
> constituents of terminal nodes in syntactic structures. You said that
> terminal nodes are actually selmaho and (iirc?) that the 1--1
> correspondence between phonological words and selmaho terminal nodes
> is not essential.


The 1-1 correspondence would be between classes of phonological words and
selmaho, since for example "mi" and "do" are two phonological words
belonging to the same selmaho KOhA. The correspondence between phonological
words and selmaho is irrelevant from the point of view of the "syntax" (in
scare quotes), which doesn't care at all about phonological form. The
"syntax" only works with selmaho. (Perhaps the ZOI delimiter is an
exception to this, since the "syntax" has to identify the final delimiter
as being the same phonological word as the initial delimiter.)


> So in that case my objection would not be to CS
> grammars per se but only to the idea that a CS grammar can model a
> whole grammar rather than just, say, the combinatorics of syntax. So I
> reserve judgement on PEG et al: if they can represent logicosyntactic
> structure in full, then they have my blessing.


They can only model the combinatorics and parse trees, they can't model
things like co-referentiality.

mu'o mi'e xorxes

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--f46d04440490413db9050e4a63da
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On W=
ed, Feb 4, 2015 at 12:45 PM, And Rosta <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:and.rosta@gmail.com" target=3D"_blank">and.rosta@gmail.com</a>&gt;</spa=
n> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><br>
But starting to tackle (3&#39;) is not so daunting:<br>
Step 1: What is the least clunky way of getting unambiguously from<br>
phonological words to logical form -- from the phonological words of<br>
Lojban sentences to the logical forms of Lojban sentences (with the<br>
notion of Lojban sentence defined by usage or consensus)? Any<br>
loglanger could have a stab at tackling this.<br></blockquote><div><br></di=
v><div>The least clunky (and only) way we have today to do this is parsers+=
Tersmu. =C2=A0</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Step 2: Identify any devices that are absent from natlangs.<br>
Step 3: Redo Step 1, without using devices identified in Step 2.<br></block=
quote><div><br></div><div>We have done some of Step 2 by way of reforming o=
ur parsing grammars, albeit mostly unofficially for now, though usually the=
 motivation is not explicitly so much that the devices are absent from natl=
angs but that we dislike them (unfortunately there&#39;s also sometimes a t=
endency to add to the weirdness, but we can hope that common sense will pre=
vail in the end). =C2=A0</div><div>=C2=A0</div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x">
Reflecting on this further, during the couple of weeks it&#39;s taken for<b=
r>
me to find the time to finish this reply, I would suggest that<br>
*official*, *definitional* specification of the grammar consist only<br>
of a set of sentences defined as pairings of phonological and logical<br>
forms (ideally, consistent with the &#39;monoparsing&#39; precept that to<b=
r>
every phonological form there must correspond no more than one logical<br>
form). </blockquote><div><br></div><div>But how do we identify those senten=
ces if not through some generating algorithm? Or do you mean just a finite =
list of sample sentences, in which case, where do we get them from?=C2=A0</=
div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Then, any rule set that=
 generates that set of pairings would be<br>
deemed to count as a valid grammar of Lojban, and then from among the<br>
valid grammars we could seek the one(s) that are closest to those<br>
internalized by human speakers.<br></blockquote><div><br></div><div>Would i=
t have to be a rule set that generates that set of pairings and only that s=
et, or could it also generate new sentences? I&#39;m not clear on whether y=
ou mean the initial set to be a finite sample from which to generalize, or =
the complete language.</div><div>=C2=A0</div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>We currently don&#39;t have a clear idea of what syntactic words Lojban<br=
>
has, where by &quot;syntactic word&quot; I mean ingredients of logicosyntac=
tic<br>
form, the form that encodes logical structure. Some phonological words<br>
seem to correspond to chunks of logical structure rather than single<br>
nodes, and there will be instances of nodes in logical structure that<br>
don&#39;t correspond to anything in phonology (-- the most obvious example<=
br>
is ellipsis, which Lojban sensibly makes heavy use of).<br></blockquote><di=
v><br></div><div>Could you give an example of a phonological word that woul=
d correspond to a chunk of logical structure? Do you mean something like &q=
uot;pe&quot; possibly being logically equivalent to &quot;poi ke&#39;a co&#=
39;e&quot; for example?=C2=A0 Would that mean that &quot;pe&quot; does not =
correspond to a syntactic word?</div><div><br></div><div>I don&#39;t see a =
problem in considering the empty phonological string as corresponding to a =
syntactic word, and in fact some of the parsers do exactly that in dealing =
with terminators. (Not sure if any parser does that yet in dealing with &qu=
ot;zo&#39;e&quot;, but then current parsers don&#39;t know the number of ar=
guments that a predicate has.)</div><div><br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><span class=3D"">
&gt; What I meant to say is that I can&#39;t see a syntax as an intrinsic f=
eature of a natlang, as opposed to being just a model, which can be a bette=
r or worse fit, but it can never be the language.<br>
<br>
</span>Are holding for natlangs the view that I propose above for Lojban,<b=
r>
namely that a language is a set of sentences, i.e. form--meaning<br>
correspondences, and although in practice there must be some system<br>
for generating that set, it doesn&#39;t matter what the system is, so long<=
br>
as it generates the right set, and therefore in that sense the system<br>
is not intrinsic to language?<br>
<br>
If Yes, I don&#39;t agree, but I think the position is coherent enough<br>
that I won&#39;t try to dissuade you from it.=C2=A0</blockquote><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">
<br>
If not, do explain again what you mean.<br></blockquote><div><br></div><div=
>I don&#39;t think a natlang can be a set of sentences because a set is muc=
h too precise an object to accurately describe a natlang, which would have =
to be fuzzy. In any case, I don&#39;t know what a natlang is, but I do thin=
k that a syntactic theory can only be a model for it and not it.</div><div>=
<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><span class=3D"">
&gt; So I can accept that binary branching syntaxes are more elegant, more =
perspicuous, etc, I just can&#39;t believe they are a feature of the langua=
ge, just like the description of a house is not a feature of the house. May=
be that&#39;s just me not being a linguist.<br>
<br>
</span>But could a description of an architectural plan of a house be an<br=
>
architectural plan of a house? Could a comprehensive explcit<br>
description of a code be a code? Surely yes, and the same for<br>
language.<br></blockquote><div><br></div><div>Certainly, but there could be=
 two different adequate architectural plans of the same house.=C2=A0</div><=
div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex">I don&#39;t know how suitable =
PEG/YACC/BNF are for natlangs. I must<br>
ruefully confess I know nothing about PEG, despite all the work you&#39;ve<=
br>
done with it. AFAIK linguists in the last half century haven&#39;t found<br=
>
BNF necessary or sufficient for their rules, but my meagre knowledge<br>
doesn&#39;t extend to knowing the mathematical properties of BNF and other<=
br>
actually used formalisms, and the relationships between them.<br></blockquo=
te><div><br></div><div>PEG is basically equivalent to BNF for present purpo=
ses, it&#39;s just an algorithm for providing a tree structure to a string =
of terminals. One nice thing about it is that PEGs are necessarily unambigu=
ous, basically by prioritizing the rules that BNF gives unprioritized.=C2=
=A0</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
In denouncing the suitability of PEG/YACC/BNF, I was really meaning to<br>
denounce treating phonological stuff (e.g. phonological words) as<br>
constituents of terminal nodes in syntactic structures. You said that<br>
terminal nodes are actually selmaho and (iirc?) that the 1--1<br>
correspondence between phonological words and selmaho terminal nodes<br>
is not essential. </blockquote><div><br></div><div>The 1-1 correspondence w=
ould be between classes of phonological words and selmaho, since for exampl=
e &quot;mi&quot; and &quot;do&quot; are two phonological words belonging to=
 the same selmaho KOhA. The correspondence between phonological words and s=
elmaho is irrelevant from the point of view of the &quot;syntax&quot; (in s=
care quotes), which doesn&#39;t care at all about phonological form. The &q=
uot;syntax&quot; only works with selmaho. (Perhaps the ZOI delimiter is an =
exception to this, since the &quot;syntax&quot; has to identify the final d=
elimiter as being the same phonological word as the initial delimiter.)</di=
v><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">So in that case my object=
ion would not be to CS<br>
grammars per se but only to the idea that a CS grammar can model a<br>
whole grammar rather than just, say, the combinatorics of syntax. So I<br>
reserve judgement on PEG et al: if they can represent logicosyntactic<br>
structure in full, then they have my blessing.</blockquote><div><br></div><=
div>They can only model the combinatorics and parse trees, they can&#39;t m=
odel things like co-referentiality.</div><div><br></div><div>mu&#39;o mi=
9;e xorxes</div><div><br></div></div></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:lojban+unsubscribe@googlegroups.com">lojban+unsub=
scribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/group/lojban">http:=
//groups.google.com/group/lojban</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

--f46d04440490413db9050e4a63da--