Received: from mail-wg0-f56.google.com ([74.125.82.56]:35436) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1YDwtf-0001Bg-5V for lojban-list-archive@lojban.org; Wed, 21 Jan 2015 07:08:44 -0800 Received: by mail-wg0-f56.google.com with SMTP id l18sf1701730wgh.1 for ; Wed, 21 Jan 2015 07:08:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=4VyWB5xT65nHfVcNQGP4Q9TW+I0jbk3Paxwx34ouck4=; b=gWlma5U1LV++7YC9zlNyem8kOoyQovlzVBEz/6PJWgnNG0yEcJCAQv06tDKReHVNVs 0HNQUNJIX3DINrNz4eSf6OzHiHjjRI7taya1TBcspCNW5yTNn+yccN3UgW7a9PaMcKvS zlpWMj6Mx7YNo8ECP24s+wnB2QeWI31XKT1cCGta2guY+Om2c+hqV5jbkhZNJ89DSS1J +aIU3Ah/Aysyd51SSGub7iOuEAn41SBAxLwLleMpFZTTakANdyntmXOdoHovyLJOSmhA FgiNIZ+ywgyxDYcJsT/YKjmPd4vzm21xARJAlX5aK6HA8wKlwaUADtvK3gXl5+VffxZK oCpA== X-Received: by 10.152.5.97 with SMTP id r1mr27343lar.9.1421852916563; Wed, 21 Jan 2015 07:08:36 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.152.29.133 with SMTP id k5ls43169lah.48.gmail; Wed, 21 Jan 2015 07:08:35 -0800 (PST) X-Received: by 10.152.2.40 with SMTP id 8mr355182lar.7.1421852915630; Wed, 21 Jan 2015 07:08:35 -0800 (PST) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com. [2a00:1450:400c:c00::22e]) by gmr-mx.google.com with ESMTPS id cl5si434757wib.3.2015.01.21.07.08.35 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Jan 2015 07:08:35 -0800 (PST) Received-SPF: pass (google.com: domain of and.rosta@gmail.com designates 2a00:1450:400c:c00::22e as permitted sender) client-ip=2a00:1450:400c:c00::22e; Received: by mail-wg0-x22e.google.com with SMTP id l2so3102457wgh.5 for ; Wed, 21 Jan 2015 07:08:35 -0800 (PST) X-Received: by 10.180.126.99 with SMTP id mx3mr58249739wib.66.1421852915371; Wed, 21 Jan 2015 07:08:35 -0800 (PST) Received: from [192.168.1.208] ([2.31.159.3]) by mx.google.com with ESMTPSA id c10sm164121wjy.4.2015.01.21.07.08.34 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Jan 2015 07:08:34 -0800 (PST) Message-ID: <54BFC0F4.1010600@gmail.com> Date: Wed, 21 Jan 2015 15:08:36 +0000 From: And Rosta User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120711 Thunderbird/14.0 MIME-Version: 1.0 To: lojban@googlegroups.com Subject: Re: [lojban] Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries References: <0CD5A578A47549238B8B046A01B8846C@gmail.com> <54BCF147.1080803@lojban.org> <54BCFC70.2010805@selpahi.de> <54BE4E4F.1060204@gmail.com> <54BEE656.9090807@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Original-Sender: and.rosta@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of and.rosta@gmail.com designates 2a00:1450:400c:c00::22e as permitted sender) smtp.mail=and.rosta@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: 0.8 (/) X-Spam_score: 0.8 X-Spam_score_int: 8 X-Spam_bar: / X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Jorge Llambías, On 21/01/2015 12:33: > > On Tue, Jan 20, 2015 at 8:35 PM, And Rosta > wrote: > > Jorge Llambías, On 20/01/2015 19:38: > > On Tue, Jan 20, 2015 at 3:28 PM, And Rosta >> wrote: > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías >> wrote: > > Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to: > > (1) convert the input into a string of phonemes > (2) convert the string of phonemes into a string of words > (3) determine a tree structure for the string of words > (4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates > > > Rather: > > (1') convert the input into a string [or perhaps tree] of phonemes > (2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words > (3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates > > > You seem to have just merged (2) and (3) into (2'), > > > No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2'). > > > Ok, I see. [...] Content analysis details: (0.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: googlegroups.com] 2.7 DNS_FROM_AHBL_RHSBL RBL: Envelope sender listed in dnsbl.ahbl.org [listed in googlegroups.com.rhsbl.ahbl.org. IN] [A] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [74.125.82.56 listed in wl.mailspike.net] 0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (and.rosta[at]gmail.com) 0.0 DKIM_ADSP_CUSTOM_MED No valid author signature, adsp_override is CUSTOM_MED -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid 0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Jorge Llamb=C3=ADas, On 21/01/2015 12:33: > > On Tue, Jan 20, 2015 at 8:35 PM, And Rosta > wrote: > > Jorge Llamb=C3=ADas, On 20/01/2015 19:38: > > On Tue, Jan 20, 2015 at 3:28 PM, And Rosta >> wrote: > On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llamb=C3=ADas >> wrote: > > Would it be fair to say that what an actual grammar shou= ld do is, given some input of sound or written characters, tell us how to: > > (1) convert the input into a string of phonemes > (2) convert the string of phonemes into a string of word= s > (3) determine a tree structure for the string of words > (4) determine which nodes of the tree are terms, which n= odes are predicates, which terms are co-referring, and which terms are argu= ments of which predicates > > > Rather: > > (1') convert the input into a string [or perhaps tree] of ph= onemes > (2') convert the string [or perhaps tree] of phonemes into a= string [or perhaps (prosodic) tree] of phonological words > (3') map the tree of phonological words to a structure of sy= ntactic 'words'/'nodes', which structure will specify which nodes of the tr= ee are terms, which nodes are predicates, which terms are co-referring, and= which terms are arguments of which predicates > > > You seem to have just merged (2) and (3) into (2'), > > > No, I meant (2') to be just a restatement of (2), with the added ackn= owledgement that in human languages there is tree-like phonological structu= re above the word level -- i.e. prosodic phonology, which yields intonation= phrases and so forth. (Google "prosodic phonology", but don't get sidetrac= ked, because it's orthogonal to my point.) I phrased it hedgily because of = course the formal definition of Lojban delibrately eschews phonological str= ucture beyond mere phoneme strings. But there is nothing of (3) in (2'). > > > Ok, I see. Then my (3) and (4) are merged into your (3'), with the > proviso that you think (3) is either useless or possibly detrimental > to achieving (3'). Yes. > BTW, don't the C's and V's of the traditional definition give some > phonological structure beyond mere phoneme strings? The PEG > morphology also makes use of syllables and their onset-nucleus-coda > components. That's phonological structure, right? Yes, but I am conscious of being among people more mathematically-minded th= an I am, so I shrink from attempting to pronounce on what sort of structure= goes beyond mere patterning in a string. At any rate, yes the traditional = definition does impose some phonological structure; but whether that is hie= rarchical rather than linear, I am uncertain. > Step (3') yields something like Tersmu output, probably augmented by = some purely syntactic (i.e. without logical import) structure. I think that= can and should be done without reference to the formal grammars. > > But Tersmu output is basically FOPL, which has its own formal grammar > (on which Lojban's formal grammar is based). I still don't see what > problems formal grammars create. (3') must certainly involve a grammar, and I can't think of any sense in wh= ich a grammar could meaningfully be called 'informal', so I'm happy to call= that grammar 'formal'. But it differs from the CS (or at least the Lojban)= notion primarily in not having phonological objects as any of its nodes an= d secondarily in not necessarily being simply a labelled bracketing of a st= ring. > > If that's more or less on track, then we can say that the = YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin'= s tersmu is trying to do (4). I would agree that the way our formal grammar= s do (3) is probably not much like the way our brains do (3), but I'm not s= ure I see what alternative we have. > > Right. So I think (3) is not a valid step. > > But why is it invalid if it achieves the desired result? > > It just doesn't yield a human language. And to the (considerable) ext= ent to which Lojban counts as a human language, it is working despite (3) r= ather than because of it. > > I can accept that, or perhaps "regardless of (3)", but I agree not "becau= se of (3)". But I'm not sure there's much left of Lojban if we remove (3). To the extent that Lojban is a language, (3) doesn't really constitute any = part of Lojban (despite the mistaken belief of many Lojbanists to the contr= ary). Also, to the extent that Lojban is a language, there exists an implic= it version of (3'), albeit not necessarily one that is coherent or unambigu= ous. So I would recommend removing the current Formal Grammars from the def= inition of Lojban, and replacing them by one -- an explicit (3') -- that mo= re credibly represents actual human language (but is unambiguous etc.). > The current PEG doesn't produce binary branching exclusively, > although it can probably be tweaked to do that by adding many > intermediate rules. Why is unary branching bad? > > > Human languages seem not to avail themselves of it; unary branching c= onstitutes a superfluous richness of structural possibilities. > > > Ok. As an example, the PEG has: > > statement <- statement-1 / prenex statement > > statement-1 <- statement-2 (I-clause joik-jek statement-2?)* > > The first rule means that a "statement" node can unary branch into a "sta= tement-1" node, or binary branch into "prenex" and "statement" nodes. The P= EG could instead just be: > > statement <-statement-2 (I-clause joik-jek statement-2?)* / prenex s= tatement > > and completely bypass the statement-1 node, which is indeed superfluous.T= he PEG can probably be re-written so as to eliminate all unary branching, a= lthough there may be a price in clarity. Good. Also questionable is the extent to which a nonterminal node can have = properties/labels not simply derived from the label of the head daughter: t= he range of views among syntacticians is too hard to summarize in one sente= nce here, but certainly one does not come across syntactic trees for natlan= g sentences with a pattern of labellings resembling Lojban's, i.e. where th= e relationship between labels on the mother and the daughters is unconstrai= ned. =20 > There are many rules where one of the branches is optional, so th= at > would result either in an empty leaf or a unary branch. > > Say you've got an optionally transitive/intransitive verb, such as En= glish _swallow_. When it has an object, they jointly form a binary branchin= g phrase. When it lacks an object, then there is no need for any branching;= so for example _I swallow_ could be a binary phrase whose constituents do = not themselves branch. (It's true that many models of syntax do allow unary= branching precisely when the daughter node is terminal, so rather than arg= ue over that, let me instead say that it's unary branching with a nontermin= al node that is superfluous.) > > OK, but is this more than just aesthetics? Unary branches don't do > anything useful, but are they harmful other than in cluttering the > tree with superfluous nodes? They're harmless clutter if there's no contrast with a version of the tree = where mother and singleton daughter merge into the same node. You need to c= onsider the branching issue together with the labelling issue. If mother an= d head-daughter have the same label, then the redundancy of unary branching= is plain. > Syntactic words and phonemes don't exist on the same plane; phonemes = don't comprise syntactic words; syntactic words don't consist of phonemes. > > Ok, but in Lojban there's almost a one-to-one match between > phonological and syntactic words. That remains to be seen, because there isn't yet an explicit real syntax fo= r Lojban. However, it's perfectly possible that in Lojban, phonology--synta= x mismatches are rare. > I think binary branching in syntax has many virtues, and I believe na= tlang syntax is binary branching (-- English for sure; other languages - pr= obably), but it's not the case that all right-minded linguisticians share t= hat view. I myself don't think that phonological structure above or below t= he word level is binary branching, but others do; either way, the nature of= phonological structure is not really germane. > > When you say something like "I believe natlang syntax is binary > branching" I realize we have a different idea about what syntax is, > because I can't have any beliefs one way or the other on whether > natlang syntax is binary branching or not.Let me try to explain with > a simple Lojban example. I'm not sure if choosing a simple Lojban example is going to reveal why you= can't have beliefs about binary branching in natlangs. Syntax is a set of = rules for combining the combinatorial units of syntax in ways that are comb= inatorially licit and that combine the units' phonological forms and their = meanings. I suspect (but excuse me if I'm mistaken) that for you every set = of rules that defines the correct set of sentences is equally valid, so tha= t so long as the rules match the right sentence sounds to the right sentenc= e meanings, it doesn't matter what the intermediate structure is like; if t= he syntactician has a job, it is to work out *a* set of rules, but there is= no reason to think there is only one correct set of rules. In contrast, pr= etty much all linguisticians think (but not always for the same reasons) th= at of the sets of rules that generate the same, correct, set of sentences, = some of those sets are right and some are wrong or at least some are righte= r and some are wronger . In my case I think the rules matter because (i) to understand the system = you need to understand its internal mechanics, and (ii) a speaker knows a c= ertain set of rules. and it's known-rules that are my object of study. > One could posit several different syntactic structures for the sumti > "lo broda ku": > > (1) (lo broda)- -ku > (2) lo- -(broda ku) > (3) (lo- -ku) -broda- > (4) lo- -broda- -ku > > For me they are all defensible. (1) probably reflects best how "ku" was b= orn, a "spoken comma", something that separates the fully formed sumti "lo = broda" from the rest of the sentence. (2) may reflect best my psychological= introspective understanding of "ku" as a terminator of the sumti-tail. (3)= reflects a popular take where lo...ku are brackets around a selbri that co= nvert it into a sumti, and (4) happens to best match what PEG, YACC and BNF= do, since they give a node with three branches. > > If I understand you correctly, only one of those four could correctly > reflect Lojban syntax, whereas for me all four are equally valid > takes since in the end it makes no difference which one we choose. > Now in the case of Lojban we could say that only one of these is the > officially correct syntax (currently that would be 4), but if > something like that happens in natlangs, does it make sense to talk > of "the syntax" for the natlang as opposed to "a syntax"? Note that I want to distinguish between "ideas that are obviously wrong" an= d "ideas I don't agree with"; the main points I wanted to make in this thre= ad pertain to the former sort, whereas my objection to non-binary-branching= is of only the latter sort. But anyway, with that caveat declared, I'd say= (on the basis of my tentative belief about the binarity of branching) that= that (4) is invalid because there is no mechanism for building it, and (3)= is, absent any additional syntactic structure, also invalid because there = is no way to generate the right order of phonological words from that synta= ctic structure. It's unlikely that the arguments for (1) and (2) are equall= y strong, but still it's possible that the grammar allows both structures o= r that there are multiple parallel equally viable grammars. FWIW I, who was= never much of a Lojban syntactician, think (1) looks to be better than (2)= . --And. --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout.