Received-SPF: pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c05::230 as permitted sender) client-ip=2a00:1450:400c:c05::230;
MIME-Version: 1.0
In-Reply-To: <CACf3dPke=nrnNuz8jAb6Oph+QOsec3LiffVCnejKwXYTSCnVpQ@mail.gmail.com>
References: <0CD5A578A47549238B8B046A01B8846C@gmail.com>
	<CAP=UV6rFA6e3bF=mDFBoDSKqo_kxMsh5xWuVbpSe94VhL2Li0Q@mail.gmail.com>
	<54BCF147.1080803@lojban.org>
	<54BCFC70.2010805@selpahi.de>
	<CAO1AUJPR_A0MtGUTjdTV0mYcFNY6sY12Ln0z3RPTYygdZEOodA@mail.gmail.com>
	<D014ACDCB5974C32A7C1D3D011A0E389@gmail.com>
	<CAO1AUJPBgGc44EPyuXRUvDVg=EoiGAsTENjqz=6h=Hqy6b3dMg@mail.gmail.com>
	<CACf3dPnND-3i=8U2y_ExH7dZZbMOFYhbvR+2o0_N+yG64m7vqw@mail.gmail.com>
	<CAO1AUJMKYQxqqA+ws2cjumSCyDPdFb92OH8XSWqx1v7wL8aE4Q@mail.gmail.com>
	<54BE4E4F.1060204@gmail.com>
	<CAO7tK2eRSmGaoF5a3jujdxYzrC8f0X6smpQedAxofyqR7BjqxA@mail.gmail.com>
	<CACf3dPke=nrnNuz8jAb6Oph+QOsec3LiffVCnejKwXYTSCnVpQ@mail.gmail.com>
Date: Tue, 20 Jan 2015 16:38:42 -0300
Message-ID: <CAO7tK2c9tSTb6PnKgqZpTRdqP7Zu07A7s-vJP6QA8GWWObjJKQ@mail.gmail.com>
Subject: Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating
 a few jbovlaste entries
From: =?UTF-8?Q?Jorge_Llamb=C3=ADas?= <jjllambias@gmail.com>
To: lojban@googlegroups.com
Content-Type: multipart/alternative; boundary=f46d0442885c1dda25050d1a9644
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
X-Spam_score: 0.8
X-Spam_score_int: 8
X-Spam_bar: /
X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com>
    wrote: > > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjllambias@gmail.com>
    > wrote: > >> >> Would it be fair to say that what an actual grammar should
    do is, given >> some input of sound or written characters, tell us how to:
    >> >> (1) convert the input into a string of phonemes >> (2) convert the
   string of phonemes into a string of words >> (3) determine a tree structure
    for the string of words >> (4) determine which nodes of the tree are terms,
    which nodes are >> predicates, which terms are co-referring, and which terms
    are arguments of >> which predicates >> > > Rather: > > (1') convert the
   input into a string [or perhaps tree] of phonemes > (2') convert the string
    [or perhaps tree] of phonemes into a string [or > perhaps (prosodic) tree]
    of phonological words > (3') map the tree of phonological words to a structure
    of syntactic > 'words'/'nodes', which structure will specify which nodes
   of the tree are > terms, which nodes are predicates, which terms are co-referring,
    and which > terms are arguments of which predicates > [...] 
 
 Content analysis details:   (0.8 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was blocked.
                             See
                             http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                              for more information.
                             [URIs: googlegroups.com]
  2.7 DNS_FROM_AHBL_RHSBL    RBL: Envelope sender listed in dnsbl.ahbl.org
                             [listed in googlegroups.com.rhsbl.ahbl.org.	IN]
                             [A]
 -0.0 RCVD_IN_MSPIKE_H3      RBL: Good reputation (+3)
                             [209.85.212.184 listed in wl.mailspike.net]
  0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
                             domains are different
 -0.0 SPF_PASS               SPF: sender matches SPF record
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                             (jjllambias[at]gmail.com)
  0.0 DKIM_ADSP_CUSTOM_MED   No valid author signature, adsp_override is
                             CUSTOM_MED
  0.0 HTML_MESSAGE           BODY: HTML included in message
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
 -0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
  0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily valid
  0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and
                             EnvelopeFrom freemail headers are different
 -0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders

--f46d0442885c1dda25050d1a9644
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com> wrote:
>
> On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llamb=C3=ADas <jjllambias@gmail.co=
m>
> wrote:
>
>>
>> Would it be fair to say that what an actual grammar should do is, given
>> some input of sound or written characters, tell us how to:
>>
>> (1) convert the input into a string of phonemes
>> (2) convert the string of phonemes into a string of words
>> (3) determine a tree structure for the string of words
>> (4) determine which nodes of the tree are terms, which nodes are
>> predicates, which terms are co-referring, and which terms are arguments =
of
>> which predicates
>>
>
> Rather:
>
> (1') convert the input into a string [or perhaps tree] of phonemes
> (2') convert the string [or perhaps tree] of phonemes into a string [or
> perhaps (prosodic) tree] of phonological words
> (3') map the tree of phonological words to a structure of syntactic
> 'words'/'nodes', which structure will specify which nodes of the tree are
> terms, which nodes are predicates, which terms are co-referring, and whic=
h
> terms are arguments of which predicates
>

You seem to have just merged (2) and (3) into (2'), which may be more
general, but in the particular case of Lojban we know that (2') can be
achieved in two independent steps, one step that takes any string of
phonemes and unambiguously dissects it into a string of words (possibly
including non-lojban words), and a second step that takes the resulting
string of words as input and unambiguously gives a unique tree structure
for them (or else rejects the string of words as ungrammatical). That
probably doesn't work for natlangs in general.

> If that's more or less on track, then we can say that the YACC/EBNF
> formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu
> is trying to do (4). I would agree that the way our formal grammars do (3=
)
> is probably not much like the way our brains do (3), but I'm not sure I s=
ee
> what alternative we have.
>
> Right. So I think (3) is not a valid step.
>

But why is it invalid if it achieves the desired result? And what's the
alternative, how else could we formalize (2')?


> (3') should be doable, partly from Tersmu and partly by using some natura=
l
> language formalism to analyse the syntax (e.g. at minimum make all phrase=
s
> headed and forbid unary branching; binary branching would be a bonus if i=
t
> could be managed).
>

In order to do (3'), we first need to do (2'). PEG does (2') (and so does
Yacc+its preparser, with some limitations). And the resulting tree has
enough detail (in the labeling of its nodes) to give us a head start with
(3'). I assume Tersmu uses the output of one of these as its input.

The current PEG doesn't produce binary branching exclusively, although it
can probably be tweaked to do that by adding many intermediate rules. Why
is unary branching bad? There are many rules where one of the branches is
optional, so that would result either in an empty leaf or a unary branch.
Would you want binary branching all the way down to phonemes, or just to
words?

mu'o mi'e xorxes

--=20
You received this message because you are subscribed to the Google Groups "=
lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--f46d0442885c1dda25050d1a9644
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
ue, Jan 20, 2015 at 3:28 PM, And Rosta <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:and.rosta@gmail.com" target=3D"_blank">and.rosta@gmail.com</a>&gt;</spa=
n> wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmai=
l_extra"><div class=3D"gmail_quote"><span class=3D"">On Tue, Jan 20, 2015 a=
t 2:59 PM, Jorge Llamb=C3=ADas <span dir=3D"ltr">&lt;<a href=3D"mailto:jjll=
ambias@gmail.com" target=3D"_blank">jjllambias@gmail.com</a>&gt;</span> wro=
te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><=
div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><div>Would it be f=
air to say that what an actual grammar should do is, given some input of so=
und or written characters, tell us how to:</div><div><br></div><div>(1) con=
vert the input into a string of phonemes</div><div>(2) convert the string o=
f phonemes into a string of words</div><div>(3) determine a tree structure =
for the string of words</div><div>(4) determine which nodes of the tree are=
 terms, which nodes are predicates, which terms are co-referring, and which=
 terms are arguments of which predicates=C2=A0</div></div></div></div></blo=
ckquote><div><br></div></span><div>Rather:<br></div><div class=3D"gmail_quo=
te"><br><div>(1&#39;) convert the input into a string [or perhaps tree] of =
phonemes</div><div>(2&#39;) convert the string  [or perhaps tree] of phonem=
es into a string  [or perhaps (prosodic) tree] of phonological words</div><=
div>(3&#39;) map the tree of phonological words to a structure of syntactic=
 &#39;words&#39;/&#39;nodes&#39;, which structure will specify which nodes =
of the tree are terms, which nodes are=20
predicates, which terms are co-referring, and which terms are arguments=20
of which predicates <br></div></div></div></div></div></blockquote><div><br=
></div><div>You seem to have just merged (2) and (3) into (2&#39;), which m=
ay be more general, but in the particular case of Lojban we know that (2=
9;) can be achieved in two independent steps, one step that takes any strin=
g of phonemes and unambiguously dissects it into a string of words (possibl=
y including non-lojban words), and a second step that takes the resulting s=
tring of words as input and unambiguously gives a unique tree structure for=
 them (or else rejects the string of words as ungrammatical). That probably=
 doesn&#39;t work for natlangs in general.=C2=A0</div><div><br></div><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div c=
lass=3D"gmail_quote"><div class=3D"gmail_quote"><span class=3D""><div>&gt; =
If that&#39;s more or less on track, then we can say that the YACC/EBNF for=
mal grammars do (3). The PEG grammar does (2) and (3). Martin&#39;s tersmu =
is trying to do (4). I would agree that the way our formal grammars do (3) =
is probably not much like the way our brains do (3), but I&#39;m not sure I=
 see what alternative we have.<br><br></div></span><div>Right. So I think (=
3) is not a valid step.</div></div></div></div></div></blockquote><div><br>=
</div><div>But why is it invalid if it achieves the desired result? And wha=
t&#39;s the alternative, how else could we formalize (2&#39;)?</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"g=
mail_extra"><div class=3D"gmail_quote"><div class=3D"gmail_quote"><div> (3&=
#39;) should be doable, partly from Tersmu and partly by using some natural=
 language formalism to analyse the syntax (e.g. at minimum make all phrases=
 headed and forbid unary branching; binary branching would be a bonus if it=
 could be managed).<br></div></div></div></div></div></blockquote><div><br>=
</div><div>In order to do (3&#39;), we first need to do (2&#39;). PEG does =
(2&#39;) (and so does Yacc+its preparser, with some limitations). And the r=
esulting tree has enough detail (in the labeling of its nodes) to give us a=
 head start with (3&#39;). I assume Tersmu uses the output of one of these =
as its input. =C2=A0</div><div><br></div><div>The current PEG doesn&#39;t p=
roduce binary branching exclusively, although it can probably be tweaked to=
 do that by adding many intermediate rules. Why is unary branching bad? The=
re are many rules where one of the branches is optional, so that would resu=
lt either in an empty leaf or a unary branch. Would you want binary branchi=
ng all the way down to phonemes, or just to words?</div><div><br></div><div=
>mu&#39;o mi&#39;e xorxes</div><div><br></div></div></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:lojban+unsubscribe@googlegroups.com">lojban+unsub=
scribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/group/lojban">http:=
//groups.google.com/group/lojban</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

--f46d0442885c1dda25050d1a9644--