Received: from mail-lb0-f189.google.com ([209.85.217.189]:34542) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.85) (envelope-from ) id 1aCT94-00071X-G2 for lojban-list-archive@lojban.org; Fri, 25 Dec 2015 06:15:11 -0800 Received: by mail-lb0-f189.google.com with SMTP id sv6sf9857063lbb.1 for ; Fri, 25 Dec 2015 06:15:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=4HlmDIQuvoWEe1gHOK3qQKKZlHKBgR4HNSJ4ccPHdA8=; b=Q/9Q5h0SRz+QzJdyJ3YAazeOCnlJcaf4LcLRhSFHGpEIZb3B80+zEMond94aSmMslP x7L6NfcRbdz7wvjU49wYsplzySo9O8iVvdXEghZTbV54QAdoeoLaaCstMlqnK+xx4YhW lJ0GJUAKvBKSQeRYQ0E15uzBMlKaGN0SGV08/IjPwfBwSuVO4yQgqzQYFaYxE9nqqGo4 mr13U6SSPXzTmg3GOz5jG16/lZwy2X800nGZVUHaWuj1SUqxjekz5ptotfHTnWpJjncj oToZjDlegYK2KCsj1U6udc2SPzKqB6jl7J4BEdtKt+GxFi4xVVwS88/wi+l1z1ba8AI7 yKcg== X-Received: by 10.28.179.7 with SMTP id c7mr108825wmf.8.1451052895700; Fri, 25 Dec 2015 06:14:55 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.28.100.139 with SMTP id y133ls29656wmb.29.gmail; Fri, 25 Dec 2015 06:14:55 -0800 (PST) X-Received: by 10.194.93.7 with SMTP id cq7mr4135486wjb.2.1451052895055; Fri, 25 Dec 2015 06:14:55 -0800 (PST) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com. [2a00:1450:400c:c09::22a]) by gmr-mx.google.com with ESMTPS id v7si8168wmg.3.2015.12.25.06.14.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Dec 2015 06:14:55 -0800 (PST) Received-SPF: pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c09::22a as permitted sender) client-ip=2a00:1450:400c:c09::22a; Received: by mail-wm0-x22a.google.com with SMTP id l126so203209031wml.1 for ; Fri, 25 Dec 2015 06:14:55 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.194.117.163 with SMTP id kf3mr44884804wjb.139.1451052894911; Fri, 25 Dec 2015 06:14:54 -0800 (PST) Received: by 10.27.15.140 with HTTP; Fri, 25 Dec 2015 06:14:54 -0800 (PST) In-Reply-To: References: Date: Fri, 25 Dec 2015 11:14:54 -0300 Message-ID: Subject: Re: [lojban] la cmaxes, a minimal morphology parser From: =?UTF-8?Q?Jorge_Llamb=C3=ADas?= To: lojban@googlegroups.com Content-Type: multipart/alternative; boundary=001a1130c84a5a05980527b99439 X-Original-Sender: jjllambias@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c09::22a as permitted sender) smtp.mailfrom=jjllambias@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - --001a1130c84a5a05980527b99439 Content-Type: text/plain; charset=UTF-8 On Fri, Dec 25, 2015 at 10:59 AM, Gleki Arxokuna wrote: > > bgv = [bgv] hgu >> >> jz = [jz] hgu >> >> cs = [cs] hgv !cs !x >> >> oops, the website wasn't updated. I will fix later. Or you can just clear > appcache for it. > " !x" isn't necessary here at all. i removed it: > http://mw.lojban.org/extensions/ilmentufa/morfologi.js.peg > But then you allow bacxa > pf = [pf] hgv >> >> >> Unfortunately, you can't do this. The !x after cs is wrong because it >> will reject for example "vasxu". But more importantly no consonant follows >> the same rules of any other consonant. You removed the restriction against >> double consonants, so "babba" will parse as a gismu. >> >> The only two letters that share identical rules are e and o. >> > > Indeed, thanks for noticing. I need to explain this parser better because > it changes something in ideology. > > Namely, it preprocesses input using a bunch or regexes. > So {zk} turns into {zyk}, {bb} into {byb} etc. > The idea is that the parser expects correct language in its input and > determine word classes, but not show mistakes in the input. > If only correct language is expected as input, then why have any restrictions at all? Why is the !cs needed, for example? And what's the point of handling with a preparser things that PEG can handle just fine? It seems that you're making the morphology harder, not easier, to grasp by hiding some things in the preparser. mu'o mi'e xorxes -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at https://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --001a1130c84a5a05980527b99439 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Fri, Dec 25, 2015 at 10:59 AM, Gleki Arxokuna <= gleki.is.my= .name@gmail.com> wrote:
bgv =3D [bgv] hgu

jz =3D [jz] hgu

cs =3D [cs] hgv !cs !x
oops, the website wasn't updated. I will fix later. Or you can just c= lear appcache for it.
" !x" isn't necessary here at= all. i removed it:
http://mw.lojban.org/extensions/ilment= ufa/morfologi.js.peg

But then you allow bacxa=C2=A0
=C2=A0
pf =3D [pf] hgv

Unfortunately, you can't do this. The !x after cs is w= rong because it will reject for example "vasxu". But more importa= ntly no consonant follows the same rules of any other consonant. You remove= d the restriction against double consonants, so "babba" will pars= e as a gismu.

The only two letters that share iden= tical rules are e and o.

Indeed, thanks for noticing. I need to explain this parser bet= ter because it changes something in ideology.

Name= ly, it preprocesses input using a bunch or regexes.
So {zk} turns= into {zyk}, {bb} into {byb} etc.
The idea is that the parser exp= ects correct language in its input and determine word classes, but not show= mistakes in the input.

=
If only correct language is expected as input, then why have any restr= ictions at all? Why is the !cs needed, for example?

And what's the point of handling with a preparser things that PEG can= handle just fine? It seems that you're making the morphology harder, n= ot easier, to grasp by hiding some things in the preparser.

<= /div>
mu'o mi'e xorxes

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http= s://groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--001a1130c84a5a05980527b99439--