Received: from mail-ee0-f56.google.com ([74.125.83.56]:52013) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1YDiKz-00081X-Sx for lojban-list-archive@lojban.org; Tue, 20 Jan 2015 15:35:58 -0800 Received: by mail-ee0-f56.google.com with SMTP id e53sf2092919eek.1 for ; Tue, 20 Jan 2015 15:35:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=NB/BRUKSh/Ch5aSR8o6VCxf/Wpv1a3jecGx1GvNIXMk=; b=j3fmG23jNh1bnhN0jHMEzopRXOz7qng2XgiY75vimHvPh7kJzmpl1+MVdhPppKLGrV m2CkEyZx/aJC43++4KgQvLkBQYLpWtSuRVQzsIect19HKPOr2KfURi0+niX7VeH1VZkQ skBIpELX4fId4IYwn8NaJqsTHr97lu0g0+ElT4XA9emZAfRhsFiSNpat/gHBaMzzV8/a TFENYQUpsL0lj/kIlgSdiI7wwHzY006aUSGaoRmKfNaRjR8zaiVdEa2C/kBSwI1ayRNI taKT0Jfw+pxWkE0C81oatQnv25pCXZuxL2mS1wTxXerttCUUXpZlVoBp9q8nfH6ikYD6 hEQw== X-Received: by 10.152.203.194 with SMTP id ks2mr56174lac.13.1421796951138; Tue, 20 Jan 2015 15:35:51 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.152.23.71 with SMTP id k7ls780453laf.9.gmail; Tue, 20 Jan 2015 15:35:50 -0800 (PST) X-Received: by 10.112.133.98 with SMTP id pb2mr4729909lbb.2.1421796950277; Tue, 20 Jan 2015 15:35:50 -0800 (PST) Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com. [2a00:1450:400c:c00::22f]) by gmr-mx.google.com with ESMTPS id v8si267988wif.1.2015.01.20.15.35.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 Jan 2015 15:35:50 -0800 (PST) Received-SPF: pass (google.com: domain of and.rosta@gmail.com designates 2a00:1450:400c:c00::22f as permitted sender) client-ip=2a00:1450:400c:c00::22f; Received: by mail-wg0-x22f.google.com with SMTP id n12so7605191wgh.6 for ; Tue, 20 Jan 2015 15:35:50 -0800 (PST) X-Received: by 10.194.59.33 with SMTP id w1mr76823019wjq.123.1421796950187; Tue, 20 Jan 2015 15:35:50 -0800 (PST) Received: from [192.168.1.208] ([2.31.159.3]) by mx.google.com with ESMTPSA id f1sm23163420wjw.30.2015.01.20.15.35.49 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 Jan 2015 15:35:49 -0800 (PST) Message-ID: <54BEE656.9090807@gmail.com> Date: Tue, 20 Jan 2015 23:35:50 +0000 From: And Rosta User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120711 Thunderbird/14.0 MIME-Version: 1.0 To: lojban@googlegroups.com Subject: Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries References: <0CD5A578A47549238B8B046A01B8846C@gmail.com> <54BCF147.1080803@lojban.org> <54BCFC70.2010805@selpahi.de> <54BE4E4F.1060204@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Original-Sender: and.rosta@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of and.rosta@gmail.com designates 2a00:1450:400c:c00::22f as permitted sender) smtp.mail=and.rosta@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: 0.8 (/) X-Spam_score: 0.8 X-Spam_score_int: 8 X-Spam_bar: / X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Jorge Llambías, On 20/01/2015 19:38: > On Tue, Jan 20, 2015 at 3:28 PM, And Rosta > wrote: > > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías > wrote: > > > Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to: > > (1) convert the input into a string of phonemes > (2) convert the string of phonemes into a string of words > (3) determine a tree structure for the string of words > (4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates > > > Rather: > > (1') convert the input into a string [or perhaps tree] of phonemes > (2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words > (3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates > > > You seem to have just merged (2) and (3) into (2'), [...] Content analysis details: (0.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: googlegroups.com] 2.7 DNS_FROM_AHBL_RHSBL RBL: Envelope sender listed in dnsbl.ahbl.org [listed in googlegroups.com.rhsbl.ahbl.org. IN] [A] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [74.125.83.56 listed in wl.mailspike.net] 0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (and.rosta[at]gmail.com) 0.0 DKIM_ADSP_CUSTOM_MED No valid author signature, adsp_override is CUSTOM_MED -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid 0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Jorge Llamb=C3=ADas, On 20/01/2015 19:38: > On Tue, Jan 20, 2015 at 3:28 PM, And Rosta > wrote: > > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llamb=C3=ADas > wrote: > > > Would it be fair to say that what an actual grammar should do is,= given some input of sound or written characters, tell us how to: > > (1) convert the input into a string of phonemes > (2) convert the string of phonemes into a string of words > (3) determine a tree structure for the string of words > (4) determine which nodes of the tree are terms, which nodes are = predicates, which terms are co-referring, and which terms are arguments of = which predicates > > > Rather: > > (1') convert the input into a string [or perhaps tree] of phonemes > (2') convert the string [or perhaps tree] of phonemes into a string [= or perhaps (prosodic) tree] of phonological words > (3') map the tree of phonological words to a structure of syntactic '= words'/'nodes', which structure will specify which nodes of the tree are te= rms, which nodes are predicates, which terms are co-referring, and which te= rms are arguments of which predicates > > > You seem to have just merged (2) and (3) into (2'), No, I meant (2') to be just a restatement of (2), with the added acknowledg= ement that in human languages there is tree-like phonological structure abo= ve the word level -- i.e. prosodic phonology, which yields intonation phras= es and so forth. (Google "prosodic phonology", but don't get sidetracked, b= ecause it's orthogonal to my point.) I phrased it hedgily because of course= the formal definition of Lojban delibrately eschews phonological structure= beyond mere phoneme strings. But there is nothing of (3) in (2'). > which may be more general, but in the particular case of Lojban we > know that (2') can be achieved in two independent steps, one step > that takes any string of phonemes and unambiguously dissects it into > a string of words (possibly including non-lojban words), yes > and a second step that takes the resulting string of words as input > and unambiguously gives a unique tree structure for them (or else > rejects the string of words as ungrammatical). No. The second step (my (3')) takes the string of phonological words but it= doesn't give a *syntactic* tree structure whose terminal nodes are phonolo= gical words, which is what I take "gives a tree structure for them" to mean= . Not every syntactic node need correspond to a phonological one (e.g. elli= psis, which Lojban uses) and a phonological word can correspond to more tha= n one syntactic one (e.g. English _you're_ is one phonological word corresp= onding to a sequence of a pronoun and an auxiliary). Rather, step (3') uses= the rules that define correspondences between elements of the sentence's p= honology and elements of the sentence's syntax, to find a sentence syntax t= hat -- in Lojban's case, uniquely -- licitly corresponds to the sentence's = phonology. Step (3') yields something like Tersmu output, probably augmented by some p= urely syntactic (i.e. without logical import) structure. I think that can a= nd should be done without reference to the formal grammars. > > If that's more or less on track, then we can say that the YACC/EBNF= formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu = is trying to do (4). I would agree that the way our formal grammars do (3) = is probably not much like the way our brains do (3), but I'm not sure I see= what alternative we have. > > Right. So I think (3) is not a valid step. > > But why is it invalid if it achieves the desired result? It just doesn't yield a human language. And to the (considerable) extent to= which Lojban counts as a human language, it is working despite (3) rather = than because of it. > And what's the alternative, how else could we formalize (2')? I think I hadn't succeeded in making you t understood what I'd meant by (2'= ). > (3') should be doable, partly from Tersmu and partly by using some na= tural language formalism to analyse the syntax (e.g. at minimum make all ph= rases headed and forbid unary branching; binary branching would be a bonus = if it could be managed). > > In order to do (3'), we first need to do (2'). PEG does (2') (and so > does Yacc+its preparser, with some limitations). And the resulting > tree has enough detail (in the labeling of its nodes) to give us a > head start with (3'). I assume Tersmu uses the output of one of these > as its input. I hope I've now explained where the misstep is, and how the product (i.e. t= he supposed operation/definition of the grammar) is something that isn't a = human language. > The current PEG doesn't produce binary branching exclusively, > although it can probably be tweaked to do that by adding many > intermediate rules. Why is unary branching bad? Human languages seem not to avail themselves of it; unary branching constit= utes a superfluous richness of structural possibilities. > There are many rules where one of the branches is optional, so that > would result either in an empty leaf or a unary branch. Say you've got an optionally transitive/intransitive verb, such as English = _swallow_. When it has an object, they jointly form a binary branching phra= se. When it lacks an object, then there is no need for any branching; so fo= r example _I swallow_ could be a binary phrase whose constituents do not th= emselves branch. (It's true that many models of syntax do allow unary branc= hing precisely when the daughter node is terminal, so rather than argue ove= r that, let me instead say that it's unary branching with a nonterminal nod= e that is superfluous.) > Would you want binary branching all the way down to phonemes, or just > to words? Syntactic words and phonemes don't exist on the same plane; phonemes don't = comprise syntactic words; syntactic words don't consist of phonemes. I thin= k binary branching in syntax has many virtues, and I believe natlang syntax= is binary branching (-- English for sure; other languages - probably), but= it's not the case that all right-minded linguisticians share that view. I = myself don't think that phonological structure above or below the word leve= l is binary branching, but others do; either way, the nature of phonologica= l structure is not really germane. --And. --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout.