Received: from mail-wi0-f184.google.com ([209.85.212.184]:40064) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1YDedV-00064h-T7 for lojban-list-archive@lojban.org; Tue, 20 Jan 2015 11:38:52 -0800 Received: by mail-wi0-f184.google.com with SMTP id z2sf1063379wiv.1 for ; Tue, 20 Jan 2015 11:38:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe; bh=79p77EokkkS60hug85i2/ENpA+OnTwnGkLE2ELeKWZM=; b=Ve+t1QjWBHxvMio0g80RjuRMSE05vG85jdpYFUChT/6lLhDcNG5eTS7gZ3P9eN//Nl 0Ymgv97XcEhCWETMtqmg2ZaShwNA7sbwJNqEQyst2efeySub0nVbZwmbbl2TKJkKL//y yM/Btxrmmolq2gI59/SBFfgKdKOFFS3tFlp4RE1XvE1Ys/A3lYxJL8zs8VUcnCBZ85dl ZoO1E9ko0f7o8non44voxfptyiT7Akd+porv4Xi3Zu5R6QN8GGPl5QPlgo4tPviWkBp/ Hoc2E1NnAOrYACup0tkYz0EE1+5qwNxhSd2O9M/UEbu3BiYDdhL1b8a4129K3ihZvurS aMhQ== X-Received: by 10.152.8.68 with SMTP id p4mr31404laa.38.1421782723292; Tue, 20 Jan 2015 11:38:43 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.152.36.73 with SMTP id o9ls691664laj.65.gmail; Tue, 20 Jan 2015 11:38:42 -0800 (PST) X-Received: by 10.112.89.36 with SMTP id bl4mr4599845lbb.1.1421782722560; Tue, 20 Jan 2015 11:38:42 -0800 (PST) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com. [2a00:1450:400c:c05::230]) by gmr-mx.google.com with ESMTPS id f6si262999wiv.0.2015.01.20.11.38.42 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Jan 2015 11:38:42 -0800 (PST) Received-SPF: pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c05::230 as permitted sender) client-ip=2a00:1450:400c:c05::230; Received: by mail-wi0-x230.google.com with SMTP id em10so9651452wid.3 for ; Tue, 20 Jan 2015 11:38:42 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.180.83.98 with SMTP id p2mr50314013wiy.76.1421782722400; Tue, 20 Jan 2015 11:38:42 -0800 (PST) Received: by 10.27.56.208 with HTTP; Tue, 20 Jan 2015 11:38:42 -0800 (PST) In-Reply-To: References: <0CD5A578A47549238B8B046A01B8846C@gmail.com> <54BCF147.1080803@lojban.org> <54BCFC70.2010805@selpahi.de> <54BE4E4F.1060204@gmail.com> Date: Tue, 20 Jan 2015 16:38:42 -0300 Message-ID: Subject: Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries From: =?UTF-8?Q?Jorge_Llamb=C3=ADas?= To: lojban@googlegroups.com Content-Type: multipart/alternative; boundary=f46d0442885c1dda25050d1a9644 X-Original-Sender: jjllambias@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c05::230 as permitted sender) smtp.mail=jjllambias@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: 0.8 (/) X-Spam_score: 0.8 X-Spam_score_int: 8 X-Spam_bar: / X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: On Tue, Jan 20, 2015 at 3:28 PM, And Rosta wrote: > > On Tue, Jan 20, 2015 at 2:59 PM, Jorge LlambĂ­as > wrote: > >> >> Would it be fair to say that what an actual grammar should do is, given >> some input of sound or written characters, tell us how to: >> >> (1) convert the input into a string of phonemes >> (2) convert the string of phonemes into a string of words >> (3) determine a tree structure for the string of words >> (4) determine which nodes of the tree are terms, which nodes are >> predicates, which terms are co-referring, and which terms are arguments of >> which predicates >> > > Rather: > > (1') convert the input into a string [or perhaps tree] of phonemes > (2') convert the string [or perhaps tree] of phonemes into a string [or > perhaps (prosodic) tree] of phonological words > (3') map the tree of phonological words to a structure of syntactic > 'words'/'nodes', which structure will specify which nodes of the tree are > terms, which nodes are predicates, which terms are co-referring, and which > terms are arguments of which predicates > [...] Content analysis details: (0.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: googlegroups.com] 2.7 DNS_FROM_AHBL_RHSBL RBL: Envelope sender listed in dnsbl.ahbl.org [listed in googlegroups.com.rhsbl.ahbl.org. IN] [A] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.212.184 listed in wl.mailspike.net] 0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (jjllambias[at]gmail.com) 0.0 DKIM_ADSP_CUSTOM_MED No valid author signature, adsp_override is CUSTOM_MED 0.0 HTML_MESSAGE BODY: HTML included in message -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid 0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders --f46d0442885c1dda25050d1a9644 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, Jan 20, 2015 at 3:28 PM, And Rosta wrote: > > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llamb=C3=ADas > wrote: > >> >> Would it be fair to say that what an actual grammar should do is, given >> some input of sound or written characters, tell us how to: >> >> (1) convert the input into a string of phonemes >> (2) convert the string of phonemes into a string of words >> (3) determine a tree structure for the string of words >> (4) determine which nodes of the tree are terms, which nodes are >> predicates, which terms are co-referring, and which terms are arguments = of >> which predicates >> > > Rather: > > (1') convert the input into a string [or perhaps tree] of phonemes > (2') convert the string [or perhaps tree] of phonemes into a string [or > perhaps (prosodic) tree] of phonological words > (3') map the tree of phonological words to a structure of syntactic > 'words'/'nodes', which structure will specify which nodes of the tree are > terms, which nodes are predicates, which terms are co-referring, and whic= h > terms are arguments of which predicates > You seem to have just merged (2) and (3) into (2'), which may be more general, but in the particular case of Lojban we know that (2') can be achieved in two independent steps, one step that takes any string of phonemes and unambiguously dissects it into a string of words (possibly including non-lojban words), and a second step that takes the resulting string of words as input and unambiguously gives a unique tree structure for them (or else rejects the string of words as ungrammatical). That probably doesn't work for natlangs in general. > If that's more or less on track, then we can say that the YACC/EBNF > formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu > is trying to do (4). I would agree that the way our formal grammars do (3= ) > is probably not much like the way our brains do (3), but I'm not sure I s= ee > what alternative we have. > > Right. So I think (3) is not a valid step. > But why is it invalid if it achieves the desired result? And what's the alternative, how else could we formalize (2')? > (3') should be doable, partly from Tersmu and partly by using some natura= l > language formalism to analyse the syntax (e.g. at minimum make all phrase= s > headed and forbid unary branching; binary branching would be a bonus if i= t > could be managed). > In order to do (3'), we first need to do (2'). PEG does (2') (and so does Yacc+its preparser, with some limitations). And the resulting tree has enough detail (in the labeling of its nodes) to give us a head start with (3'). I assume Tersmu uses the output of one of these as its input. The current PEG doesn't produce binary branching exclusively, although it can probably be tweaked to do that by adding many intermediate rules. Why is unary branching bad? There are many rules where one of the branches is optional, so that would result either in an empty leaf or a unary branch. Would you want binary branching all the way down to phonemes, or just to words? mu'o mi'e xorxes --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --f46d0442885c1dda25050d1a9644 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On T= ue, Jan 20, 2015 at 3:28 PM, And Rosta <and.rosta@gmail.com> wrote:
On Tue, Jan 20, 2015 a= t 2:59 PM, Jorge Llamb=C3=ADas <jjllambias@gmail.com> wro= te:
<= div class=3D"gmail_extra">
Would it be f= air to say that what an actual grammar should do is, given some input of so= und or written characters, tell us how to:

(1) con= vert the input into a string of phonemes
(2) convert the string o= f phonemes into a string of words
(3) determine a tree structure = for the string of words
(4) determine which nodes of the tree are= terms, which nodes are predicates, which terms are co-referring, and which= terms are arguments of which predicates=C2=A0

Rather:

(1') convert the input into a string [or perhaps tree] of = phonemes
(2') convert the string [or perhaps tree] of phonem= es into a string [or perhaps (prosodic) tree] of phonological words
<= div>(3') map the tree of phonological words to a structure of syntactic= 'words'/'nodes', which structure will specify which nodes = of the tree are terms, which nodes are=20 predicates, which terms are co-referring, and which terms are arguments=20 of which predicates
You seem to have just merged (2) and (3) into (2'), which m= ay be more general, but in the particular case of Lojban we know that (2= 9;) can be achieved in two independent steps, one step that takes any strin= g of phonemes and unambiguously dissects it into a string of words (possibl= y including non-lojban words), and a second step that takes the resulting s= tring of words as input and unambiguously gives a unique tree structure for= them (or else rejects the string of words as ungrammatical). That probably= doesn't work for natlangs in general.=C2=A0

> = If that's more or less on track, then we can say that the YACC/EBNF for= mal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu = is trying to do (4). I would agree that the way our formal grammars do (3) = is probably not much like the way our brains do (3), but I'm not sure I= see what alternative we have.

Right. So I think (= 3) is not a valid step.

=
But why is it invalid if it achieves the desired result? And wha= t's the alternative, how else could we formalize (2')?
= =C2=A0
(3&= #39;) should be doable, partly from Tersmu and partly by using some natural= language formalism to analyse the syntax (e.g. at minimum make all phrases= headed and forbid unary branching; binary branching would be a bonus if it= could be managed).

=
In order to do (3'), we first need to do (2'). PEG does = (2') (and so does Yacc+its preparser, with some limitations). And the r= esulting tree has enough detail (in the labeling of its nodes) to give us a= head start with (3'). I assume Tersmu uses the output of one of these = as its input. =C2=A0

The current PEG doesn't p= roduce binary branching exclusively, although it can probably be tweaked to= do that by adding many intermediate rules. Why is unary branching bad? The= re are many rules where one of the branches is optional, so that would resu= lt either in an empty leaf or a unary branch. Would you want binary branchi= ng all the way down to phonemes, or just to words?

mu'o mi'e xorxes

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http:= //groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--f46d0442885c1dda25050d1a9644--