Received: from mail-ee0-f58.google.com ([74.125.83.58]:53904) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1YLWkn-0002IR-Sd for lojban-list-archive@lojban.org; Wed, 11 Feb 2015 04:51:00 -0800 Received: by mail-ee0-f58.google.com with SMTP id t10sf820128eei.3 for ; Wed, 11 Feb 2015 04:50:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe; bh=JovEBW3uSgWqLrHLkmJHcc222MA5xnK+9gs4hixS7dU=; b=YQvG+6uWwWHPnVsZ+MfgPcgEGq48jp3fKREnl59uNgFczuomRZQYk6ZpGfK3Ypkddu xzrmSYythrjb9lFn2sIUUWI9z0o9qGe5y1WxVyzRiLWPcaJ4YYu9vZ1PL5nsR83zkOrc cfcYdGkdJMNcIxzU6DnV4iG+k+ooIzuRwWD/zVXm118rlJ9wf7Nw2S4eAiX/ep2CLwRQ JzqUrjxyGtdq/umnkL+0xHZvmCJkqiwCJyKmlPs4cgjEIsyfZaskzC3n3Vvw8NMMS3c5 5vy5a9ulSTEgmQD+ybgynA9JJOCApAtwUCOo7qFz3ewxGzVoAL3/YNcVjxVa/494Tn7O ZDxg== X-Received: by 10.152.170.202 with SMTP id ao10mr270433lac.23.1423659047071; Wed, 11 Feb 2015 04:50:47 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.152.2.99 with SMTP id 3ls59306lat.107.gmail; Wed, 11 Feb 2015 04:50:46 -0800 (PST) X-Received: by 10.112.124.142 with SMTP id mi14mr3422302lbb.1.1423659046268; Wed, 11 Feb 2015 04:50:46 -0800 (PST) Received: from mail-we0-x234.google.com (mail-we0-x234.google.com. [2a00:1450:400c:c03::234]) by gmr-mx.google.com with ESMTPS id l8si35799wia.0.2015.02.11.04.50.46 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Feb 2015 04:50:46 -0800 (PST) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c03::234 as permitted sender) client-ip=2a00:1450:400c:c03::234; Received: by mail-we0-x234.google.com with SMTP id k11so3162242wes.11 for ; Wed, 11 Feb 2015 04:50:46 -0800 (PST) X-Received: by 10.181.29.168 with SMTP id jx8mr5488520wid.8.1423659046034; Wed, 11 Feb 2015 04:50:46 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.86.200 with HTTP; Wed, 11 Feb 2015 04:50:25 -0800 (PST) In-Reply-To: References: <20150204124517.GA1243@kuebelreiter.informatik.Uni-Osnabrueck.DE> From: Gleki Arxokuna Date: Wed, 11 Feb 2015 15:50:25 +0300 Message-ID: Subject: Re: [lojban] the myth of monoparsing To: "lojban@googlegroups.com" Content-Type: multipart/alternative; boundary=001a11335b74b86553050ecf7392 X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c03::234 as permitted sender) smtp.mail=gleki.is.my.name@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: 0.8 (/) X-Spam_score: 0.8 X-Spam_score_int: 8 X-Spam_bar: / X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: 2015-02-09 23:22 GMT+03:00 ianek : > > > On Monday, February 9, 2015 at 11:54:41 AM UTC+1, la gleki wrote: >> >> >> >> 2015-02-08 4:34 GMT+03:00 ianek : >> >>> >>> >>> On Friday, February 6, 2015 at 8:13:30 AM UTC+1, la gleki wrote: >>>> >>>> >>>> >>>> 2015-02-04 15:45 GMT+03:00 v4hn : >>>> >>>>> On Tue, Feb 03, 2015 at 11:42:32AM +0300, Gleki Arxokuna wrote: >>>>> > "Fred saw a plane flying over Zurich" can have several meanings >>>>> >>>>> Yes. >>>>> However, for me, the issue here is that we (hopefully..) agree >>>>> that there are different parse trees (which yield the different >>>>> meanings). >>>>> >>>> >>>> No, several trees arise after you interpret the sentence. >>>> >>> >>> But if you had an English parser, it would yield several trees without >>> any interpreting. >>> >> >> Sure! Because English parsers lack the ability to find something common >> in all of the parse trees. >> > > No. It's because words in an English sentence can be parsed as different > syntactic structures. That's what parsing means: determining structures > formed by words. Not "finding something common". > [...] Content analysis details: (0.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: googlegroups.com] 2.7 DNS_FROM_AHBL_RHSBL RBL: Envelope sender listed in dnsbl.ahbl.org [listed in googlegroups.com.rhsbl.ahbl.org. IN] [A] 0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (gleki.is.my.name[at]gmail.com) 0.0 DKIM_ADSP_CUSTOM_MED No valid author signature, adsp_override is CUSTOM_MED 0.0 HTML_MESSAGE BODY: HTML included in message -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid 0.0 T_FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different --001a11335b74b86553050ecf7392 Content-Type: text/plain; charset=UTF-8 2015-02-09 23:22 GMT+03:00 ianek : > > > On Monday, February 9, 2015 at 11:54:41 AM UTC+1, la gleki wrote: >> >> >> >> 2015-02-08 4:34 GMT+03:00 ianek : >> >>> >>> >>> On Friday, February 6, 2015 at 8:13:30 AM UTC+1, la gleki wrote: >>>> >>>> >>>> >>>> 2015-02-04 15:45 GMT+03:00 v4hn : >>>> >>>>> On Tue, Feb 03, 2015 at 11:42:32AM +0300, Gleki Arxokuna wrote: >>>>> > "Fred saw a plane flying over Zurich" can have several meanings >>>>> >>>>> Yes. >>>>> However, for me, the issue here is that we (hopefully..) agree >>>>> that there are different parse trees (which yield the different >>>>> meanings). >>>>> >>>> >>>> No, several trees arise after you interpret the sentence. >>>> >>> >>> But if you had an English parser, it would yield several trees without >>> any interpreting. >>> >> >> Sure! Because English parsers lack the ability to find something common >> in all of the parse trees. >> > > No. It's because words in an English sentence can be parsed as different > syntactic structures. That's what parsing means: determining structures > formed by words. Not "finding something common". > You yourself just showed several parses of the same sentence. This is how usual English parsers are constructed. However, there is another option to monoparse this English sentence. You mix English language and one current theory of how to parse it. > >> >> >>> Like this: >>> >>> "Fred saw a plane flying over Zurich" >>> NAME VERB-PAST ARTICLE COUNTABLE-NOUN VERB-ING PREPOSITION NAME >>> >>> Some (much simplified) rules could be: >>> >>> Sentence ::= Noun-Phrase Verb Noun-Phrase >>> Sentence ::= Noun-Phrase Verb Noun-Phrase Adverbial-Phrase >>> Noun-Phrase ::= NAME | ARTICLE COUNTABLE-NOUN | Noun-Phrase VERB-ING >>> Prepositional-Clause >>> Verb ::= VERB-PAST >>> Adverbial-Phrase ::= VERB-ING Preposition-Clause >>> Preposition-Clause ::= PREPOSITION Noun-Phrase >>> >>> This simple grammar yields two parse trees for that sentence: >>> >>> Sentence >>> ----Noun-Phrase >>> --------NAME >>> ------------Fred >>> ----Verb >>> --------VERB-PAST >>> ------------saw >>> ----Noun-Phrase >>> --------Noun-Phrase >>> ------------ARTICLE >>> ----------------a >>> ------------NOUN >>> ----------------plane >>> --------VERB-ING >>> ------------flying >>> --------Prepositional-Clause >>> ------------PROPOSITION >>> ----------------over >>> ------------Noun-Phrase >>> ----------------NAME >>> --------------------Zurich >>> >>> Sentence >>> ----Noun-Phrase >>> --------NAME >>> ------------Fred >>> ----Verb >>> --------VERB-PAST >>> ------------saw >>> ----Noun-Phrase >>> --------Noun-Phrase >>> ------------ARTICLE >>> ----------------a >>> ------------NOUN >>> ----------------plane >>> ----Adverbial-Phrase >>> --------VERB-ING >>> ------------flying >>> --------Prepositional-Clause >>> ------------PROPOSITION >>> ----------------over >>> ------------Noun-Phrase >>> ----------------NAME >>> --------------------Zurich >>> >>> Formal grammars for natural languages do exist, although they're not >>> perfect, but the problem with multiple grammatically sensible parses (often >>> millions of trees and more) is much greater than the problem with >>> nonsensible trees or correct sentences that don't parse at all. >>> >>> Lojban was carefully designed to avoid this problem. And it doesn't have >>> anything to do with {xi PA}. The Lojban grammar specifies XI clauses >>> unambiguously. Parse trees are unique. Monoparsing is not a myth. XI >>> clauses may add semantic ambiguity on a different level then, say, simple >>> {zo'e}, but it doesn't have anything to do with syntactic ambiguity. >>> >> >> It specifies to which head a clause should attach. And since it's {mo'e >> zo'e} it's vague to which head it attaches. If the parser you use doesn't >> allow for that the only thing that can be done is to provide several >> possible trees. >> > > It's a feature of a language, not a parser. If English had a pronoun, say, > 'lar', which would mean 'the subject or the object of the main sentence', > you could say "Fred saw a plane as lar flew over Zurich", which would be > ambiguous semantically, but not syntactically. > Even in current English theory there are a lot of zero morphemes. What I'm proposing is just another zero morpheme. This is what And agreed with me. > >> >>> >> {la fred pu viska lo vinji do'e lo se xi vei mo'e zo'e nei poi vofli ga'u >>> la tsurix} has only one syntax tree, regardless of the number of possible >>> semantic interpretations. >>> >> >> If you applied {mo'e zo'e} to the English sentence you will still get the >> only syntax tree. >> > > You can't "apply" {mo'e zo'e} to the English sentence, because it's not > there. Likewise you don't "apply" {mo'e zo'e} to the Lojban sentence. You > just parse it, because it's there. > In English you can have phrases like 'X of Y of Z' which could be parsed > as '(X of Y) of Z' or 'X of (Y of Z)'. In Lojban it's not possible, but you > can say ''either (X of Y) of Z or X of (Y of Z)", which is not > syntactically ambiguous. You can't apply "either... or" to the English > sentence, because you can't parse words which aren't there. > As I just said English parsers use this "add words that aren't there" all the time. > >> >>> In English you can have sentences that are semantically ambiguous due to >>> syntactic ambiguity. In Lojban you can have sentences with (roughly) the >>> same semantic ambiguity as the English ones, but syntactically unambiguous. >>> >>> >>>> >>>>> > {la fred pu viska lo vinji do'e lo se xi vei mo'e zo'e nei poi vofli >>>>> ga'u >>>>> > la tsurix} >>>>> >>>>> camxes only produces one parse tree for that. >>>>> >>>> >>>> And for English you don't provide any parses at all. >>>> May be someone should just parse the original English sentence as >>>> camxes does for Lojban one? >>>> I won't be surprised if such parser for English doesn't exist since >>>> those who write them might mix parsing and interpretation of it. The latter >>>> would be replacing {mo'e zo'e} with some PA which will immediately lead to >>>> several syntactic trees. >>>> >>>> So I both disagree and agree with you on whether English sentence has >>>> several syntactic trees. If using one term for two operations is stopped >>>> the contradiction disappears. >>>> >>>> >>>> >>>>> If you think it should produce more then one, raise a bug report. >>>>> >>>> >>>> I'm not aware of any Lojban parsers that perform interpretation >>>> operation. In most cases you just need context and one interpretation. But >>>> this is semantic analysis. Producing all possible syntactic trees is a task >>>> needed more seldom. >>>> >>> >>> Camxes is intended to produce all possible syntactic trees, and there's >>> only one of them for any valid sentence. >>> >> >> You may invent a Lojban parser that won't be able to parse {mo'e zo'e}. >> Then you will need workarounds to output several trees. >> > > XI clauses have an ambiguous syntax, so I don't see how I'd need > workarounfds and several trees. Of course, I could invent a Lojban parser > that won't be able to parse anything, but what's the point? {mo'e zo'e} > from the parser's view is just MOhE KOhA. If I can't parse it, then I have > an incomplete parser. > And this is what I state for English: its current parsers are incomplete and further improvements will make polyparsed sentences monoparsed. > > What you mean sounds rather like a semantic analyzer, which is extremely > hard for any language, including Lojban. > > mu'o mi'e ianek > > >> >> >>> >>> mu'o mi'e ianek >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "lojban" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to lojban+un...@googlegroups.com. >>> To post to this group, send email to loj...@googlegroups.com. >>> Visit this group at http://groups.google.com/group/lojban. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to lojban+unsubscribe@googlegroups.com. > To post to this group, send email to lojban@googlegroups.com. > Visit this group at http://groups.google.com/group/lojban. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --001a11335b74b86553050ecf7392 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


2015-02-09 23:22 GMT+03:00 ianek <janek37@gmail.com>:


On Monday, Februar= y 9, 2015 at 11:54:41 AM UTC+1, la gleki wrote:


2015-02-08 4:34 GMT+03:00 ianek <jan...@gmail.com>:


On Friday, February 6, 2015 at 8:13:30 AM UTC+1,= la gleki wrote:


2015-02-04 15:45 GMT+03:00 v4hn= <m...@v4hn.de>:
=
On Tue, Feb 03, 2015 at 11:42:32AM +0300, Gleki Arxo= kuna wrote:
> "Fred saw a plane flying over Zurich" can have several meani= ngs

Yes.
However, for me, the issue here is that we (hopefully..) agree
that there are different parse trees (which yield the different meanings).<= br>

No, several trees arise after you inter= pret the sentence.

But = if you had an English parser, it would yield several trees without any inte= rpreting.

Sure! Because English= parsers lack the ability to find something common in all of the parse tree= s.

No. It's because= words in an English sentence can be parsed as different syntactic structur= es. That's what parsing means: determining structures formed by words. = Not "finding something common".
=
You yourself just showed several parses of the same sentence= .
This is how usual English parsers are constructed.=C2=A0
<= div>
However, there is another option to monoparse this Engli= sh sentence.

You mix English language and one curr= ent theory of how to parse it.

=C2=A0
=C2=A0
L= ike this:

"Fred saw a plane flying over Zurich"=
NAME VERB-PAST ARTICLE COUNTABLE-NOUN VERB-ING PREPOSITION NAME<= br>
Some (much simplified) rules could be:

Sentence ::=3D Noun-Ph= rase Verb Noun-Phrase
Sentence ::=3D Noun-Phrase Verb Noun-Phrase Adverb= ial-Phrase
Noun-Phrase ::=3D NAME | ARTICLE COUNTABLE-NOUN | Noun-Phrase= VERB-ING Prepositional-Clause
Verb ::=3D VERB-PAST
Adverbial-Phrase = ::=3D VERB-ING Preposition-Clause
Preposition-Clause ::=3D PREPOSITION N= oun-Phrase

This simple grammar yields two parse trees for that sente= nce:

Sentence
----Noun-Phrase
--------NAME
------------Fred=
----Verb
--------VERB-PAST
------------saw
----Noun-Phrase
= --------Noun-Phrase
------------ARTICLE
----------------a
--------= ----NOUN
----------------plane
--------VERB-ING
------------flying=
--------Prepositional-Clause
------------PROPOSITION
------------= ----over
------------Noun-Phrase
----------------NAME
------------= --------Zurich

Sentence
----Noun-Phrase
--------NAME
------= ------Fred
----Verb
--------VERB-PAST
------------saw
----Noun-= Phrase
--------Noun-Phrase
------------ARTICLE
----------------a------------NOUN
----------------plane
----Adverbial-Phrase
----= ----VERB-ING
------------flying
--------Prepositional-Clause
-----= -------PROPOSITION
----------------over
------------Noun-Phrase
--= --------------NAME
--------------------Zurich

Formal grammars for= natural languages do exist, although they're not perfect, but the prob= lem with multiple grammatically sensible parses (often millions of trees an= d more) is much greater than the problem with nonsensible trees or correct = sentences that don't parse at all.

Lojban was carefully designed= to avoid this problem. And it doesn't have anything to do with {xi PA}= . The Lojban grammar specifies XI clauses unambiguously. Parse trees are un= ique. Monoparsing is not a myth. XI clauses may add semantic ambiguity on a= different level then, say, simple {zo'e}, but it doesn't have anyt= hing to do with syntactic ambiguity.
It specifies to which head a clause should attach. And since it= 's {mo'e zo'e} it's vague to which head it attaches. If the= parser you use doesn't allow for that the only thing that can be done = is to provide several possible trees.

It's a feature of a language, not a parser. If = English had a pronoun, say, 'lar', which would mean 'the subjec= t or the object of the main sentence', you could say "Fred saw a p= lane as lar flew over Zurich", which would be ambiguous semantically, = but not syntactically.

Even= in current English theory there are a lot of zero morphemes. What I'm = proposing is just another zero morpheme.

This is w= hat And agreed with me.

=

<= div>
=C2= =A0
{la fred pu viska lo vinji do'e lo se xi vei mo'= e zo'e nei poi vofli ga'u la tsurix} has only one syntax tree, rega= rdless of the number of possible semantic interpretations.
<= /div>

If you applied {mo'e zo'e} to= the English sentence you will still get the only syntax tree.

You can't "apply"= {mo'e zo'e} to the English sentence, because it's not there. L= ikewise you don't "apply" {mo'e zo'e} to the Lojban s= entence. You just parse it, because it's there.
In English you can h= ave phrases like 'X of Y of Z' which could be parsed as '(X of = Y) of Z' or 'X of (Y of Z)'. In Lojban it's not possible, b= ut you can say ''either (X of Y) of Z or X of (Y of Z)", which= is not syntactically ambiguous. You can't apply "either... or&quo= t; to the English sentence, because you can't parse words which aren= 9;t there.

As I just said E= nglish parsers use this "add words that aren't there" =C2=A0a= ll the time.




In English you can ha= ve sentences that are semantically ambiguous due to syntactic ambiguity. In= Lojban you can have sentences with (roughly) the same semantic ambiguity a= s the English ones, but syntactically unambiguous.
=C2=A0

> {la fred pu viska lo vinji do'e lo se xi vei mo'e zo'e nei= poi vofli ga'u
> la tsurix}

camxes only produces one parse tree for that.
<= br>
And for English you don't provide any parses at all.
May be someone should just parse the original English sentence as cam= xes does for Lojban one?
I won't be surprised if such parser = for English doesn't exist since those who write them might mix parsing = and interpretation of it. The latter would be replacing {mo'e zo'e}= with some PA which will immediately lead to several syntactic trees.
=

So I both disagree and agree with you on whether Englis= h sentence has several syntactic trees. If using one term for two operation= s is stopped the contradiction disappears.

=C2=A0<= /div>
If you think it should produce more then one, raise a bug report.

I'm not aware of any Lojban parsers that per= form interpretation operation. In most cases you just need context and one = interpretation. But this is semantic analysis. Producing all possible synta= ctic trees is a task needed more seldom.

Camxes is intended to produce all possible syntactic tree= s, and there's only one of them for any valid sentence.
=

You may invent a Lojban parser that won= 9;t be able to parse {mo'e zo'e}. Then you will need workarounds to= output several trees.

= XI clauses have an ambiguous syntax, so I don't see how I'd need wo= rkarounfds and several trees. Of course, I could invent a Lojban parser tha= t won't be able to parse anything, but what's the point? {mo'e = zo'e} from the parser's view is just MOhE KOhA. If I can't pars= e it, then I have an incomplete parser.
And this is what I state for English: its current parsers are = incomplete and further improvements will make polyparsed sentences monopars= ed.
=C2=A0
<= div>
What you mean sounds rather like a semantic analyzer, which is extr= emely hard for any language, including Lojban.

mu&#= 39;o mi'e ianek
=C2=A0
=C2=A0

mu&= #39;o mi'e ianek

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+un...@googlegroups.com.
To post to this group, send email to loj...@googlegroup= s.com.
Visit this group at http://groups.google.com/group/lojba= n.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http:= //groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--001a11335b74b86553050ecf7392--