Received: from mail-wm0-f63.google.com ([74.125.82.63]:34236) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.85) (envelope-from ) id 1aCTPu-0007Gd-FI for lojban-list-archive@lojban.org; Fri, 25 Dec 2015 06:32:31 -0800 Received: by mail-wm0-f63.google.com with SMTP id l126sf32487972wml.1 for ; Fri, 25 Dec 2015 06:32:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=dVX5FbxypjmyTW7vGCCDuEBLi/zrtHQxW+kZds7MoG4=; b=T7m8MWPXGJzRPNpmwqH3UWUYRgeaMDLNRw9Da2NRSXALCA6/y7UwEptw70bfseBDR5 Xf0DCURbG18Yf75NXDmUknVqov7yS6CjcyBdrqeifUIbOZOLcnZbYrWetY8JTxn6g0Mp ZCCpFCx0wylwNlLent7+sPfTl2XLA0DBG1hJNKbIu0jabDVgAD3nAI/4U0u1hB6d0zvi 5Psfd596B7NA5x5otMaMwJRsUN02B8gcGN2glGwY98uX1Nro+MBQoEcvEcZiXHntLCBw QVJowU8esbT5jUHrGo/SYjjLTDoIH2D1Bjc0nMPKvkPt+ZhsejO0mpLBJa4AhcRGDdek q43A== X-Received: by 10.28.5.5 with SMTP id 5mr109022wmf.5.1451053939889; Fri, 25 Dec 2015 06:32:19 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.28.101.84 with SMTP id z81ls646531wmb.5.canary; Fri, 25 Dec 2015 06:32:19 -0800 (PST) X-Received: by 10.28.216.72 with SMTP id p69mr4042849wmg.5.1451053939356; Fri, 25 Dec 2015 06:32:19 -0800 (PST) Received: from mail-wm0-x236.google.com (mail-wm0-x236.google.com. [2a00:1450:400c:c09::236]) by gmr-mx.google.com with ESMTPS id w129si28198wme.1.2015.12.25.06.32.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Dec 2015 06:32:19 -0800 (PST) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c09::236 as permitted sender) client-ip=2a00:1450:400c:c09::236; Received: by mail-wm0-x236.google.com with SMTP id l126so208935062wml.1 for ; Fri, 25 Dec 2015 06:32:19 -0800 (PST) X-Received: by 10.28.47.11 with SMTP id v11mr32730439wmv.27.1451053939218; Fri, 25 Dec 2015 06:32:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.92.206 with HTTP; Fri, 25 Dec 2015 06:31:39 -0800 (PST) In-Reply-To: References: From: Gleki Arxokuna Date: Fri, 25 Dec 2015 17:31:39 +0300 Message-ID: Subject: Re: [lojban] la cmaxes, a minimal morphology parser To: "lojban@googlegroups.com" Content-Type: multipart/alternative; boundary=001a11423e0098e0ff0527b9d21e X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c09::236 as permitted sender) smtp.mailfrom=gleki.is.my.name@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - --001a11423e0098e0ff0527b9d21e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2015-12-25 17:14 GMT+03:00 Jorge Llamb=C3=ADas : > > > On Fri, Dec 25, 2015 at 10:59 AM, Gleki Arxokuna < > gleki.is.my.name@gmail.com> wrote: >> >> bgv =3D [bgv] hgu >>> >>> jz =3D [jz] hgu >>> >>> cs =3D [cs] hgv !cs !x >>> >>> oops, the website wasn't updated. I will fix later. Or you can just >> clear appcache for it. >> " !x" isn't necessary here at all. i removed it: >> http://mw.lojban.org/extensions/ilmentufa/morfologi.js.peg >> > > But then you allow bacxa > > >> pf =3D [pf] hgv >>> >>> >>> Unfortunately, you can't do this. The !x after cs is wrong because it >>> will reject for example "vasxu". But more importantly no consonant foll= ows >>> the same rules of any other consonant. You removed the restriction agai= nst >>> double consonants, so "babba" will parse as a gismu. >>> >>> The only two letters that share identical rules are e and o. >>> >> >> Indeed, thanks for noticing. I need to explain this parser better becaus= e >> it changes something in ideology. >> >> Namely, it preprocesses input using a bunch or regexes. >> So {zk} turns into {zyk}, {bb} into {byb} etc. >> The idea is that the parser expects correct language in its input and >> determine word classes, but not show mistakes in the input. >> > > If only correct language is expected as input, then why have any > restrictions at all? Why is the !cs needed, for example? > Yes, it isn't needed either. > And what's the point of handling with a preparser things that PEG can > handle just fine? It seems that you're making the morphology harder, not > easier, to grasp by hiding some things in the preparser. > Well, then such parser can be just forked into another one for learning morphology by humans. The current one is mostly to be used to quickly restore word classes in words that are assumed to be grammatical and to restore spaces to cmavo compounds. > mu'o mi'e xorxes > > -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to lojban+unsubscribe@googlegroups.com. > To post to this group, send email to lojban@googlegroups.com. > Visit this group at https://groups.google.com/group/lojban. > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at https://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --001a11423e0098e0ff0527b9d21e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


2015-12-25 17:14 GMT+03:00 Jorge Llamb=C3=ADas <jjllambias@gmail.co= m>:


On= Fri, Dec 25, 2015 at 10:59 AM, Gleki Arxokuna <gleki.is.my.name@= gmail.com> wrote:
bgv =3D [bgv] hgu

jz =3D [jz] hgu

cs =3D [cs] hgv !cs !x
oops, the website wasn't updated. I will fix later. Or you can just c= lear appcache for it.
" !x" isn't necessary here at= all. i removed it:
http://mw.lojban.org/extensions/ilment= ufa/morfologi.js.peg

But then you allow bacxa=C2=A0
=C2= =A0
pf =3D [pf] hgv<=
/pre>

Unfortunately, you can't do this. The !x= after cs is wrong because it will reject for example "vasxu". Bu= t more importantly no consonant follows the same rules of any other consona= nt. You removed the restriction against double consonants, so "babba&q= uot; will parse as a gismu.

The only two letters t= hat share identical rules are e and o.
=

Indeed, thanks for noticing. I need to explain t= his parser better because it changes something in ideology.

<= /div>
Namely, it preprocesses input using a bunch or regexes.
So {zk} turns into {zyk}, {bb} into {byb} etc.
The idea is that = the parser expects correct language in its input and determine word classes= , but not show mistakes in the input.
<= div>
If only correct language is expected as input, th= en why have any restrictions at all? Why is the !cs needed, for example?

Yes, it isn't nee= ded either.=C2=A0


<= /div>
And what's the point of handling with a preparser things that= PEG can handle just fine? It seems that you're making the morphology h= arder, not easier, to grasp by hiding some things in the preparser.

Well, then such parser can= be just forked into another one for learning morphology by humans.
The current one is mostly to be used to quickly restore word classes in = words that are assumed to be grammatical and to restore spaces to cmavo com= pounds.


mu'o mi'e xorxes

=

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http= s://groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--001a11423e0098e0ff0527b9d21e--