Received-SPF: pass (google.com: domain of jjllambias@gmail.com designates 2a00:1450:400c:c09::22a as permitted sender) client-ip=2a00:1450:400c:c09::22a;
MIME-Version: 1.0
In-Reply-To: <CAO7bV+jo6QoobUWfzXU_K8CZPadXG-wxtrBGcJEZX0mLvyzQjw@mail.gmail.com>
References: <CAO7bV+iwi1F+h-vGLteSo0zyVRnoz_=Q3BCGkQqNnBD7Brag1Q@mail.gmail.com>
	<CAO7tK2fnesAreQixfaJ3gw_ZrZs07U6b=dU7BeStKogEsDiXwg@mail.gmail.com>
	<CAO7bV+jo6QoobUWfzXU_K8CZPadXG-wxtrBGcJEZX0mLvyzQjw@mail.gmail.com>
Date: Fri, 25 Dec 2015 11:14:54 -0300
Message-ID: <CAO7tK2cQVgE_bf0D-T7Og8ewKHe63YQjsZBm9_tQsDsPCb4XPQ@mail.gmail.com>
Subject: Re: [lojban] la cmaxes, a minimal morphology parser
From: =?UTF-8?Q?Jorge_Llamb=C3=ADas?= <jjllambias@gmail.com>
To: lojban@googlegroups.com
Content-Type: multipart/alternative; boundary=001a1130c84a5a05980527b99439
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
X-Spam_score: -1.7
X-Spam_score_int: -16
X-Spam_bar: -

--001a1130c84a5a05980527b99439
Content-Type: text/plain; charset=UTF-8

On Fri, Dec 25, 2015 at 10:59 AM, Gleki Arxokuna <gleki.is.my.name@gmail.com
> wrote:
>
> bgv = [bgv] hgu
>>
>> jz = [jz] hgu
>>
>> cs = [cs] hgv !cs !x
>>
>> oops, the website wasn't updated. I will fix later. Or you can just clear
> appcache for it.
> " !x" isn't necessary here at all. i removed it:
> http://mw.lojban.org/extensions/ilmentufa/morfologi.js.peg
>

But then you allow bacxa


> pf = [pf] hgv
>>
>>
>> Unfortunately, you can't do this. The !x after cs is wrong because it
>> will reject for example "vasxu". But more importantly no consonant follows
>> the same rules of any other consonant. You removed the restriction against
>> double consonants, so "babba" will parse as a gismu.
>>
>> The only two letters that share identical rules are e and o.
>>
>
> Indeed, thanks for noticing. I need to explain this parser better because
> it changes something in ideology.
>
> Namely, it preprocesses input using a bunch or regexes.
> So {zk} turns into {zyk}, {bb} into {byb} etc.
> The idea is that the parser expects correct language in its input and
> determine word classes, but not show mistakes in the input.
>

If only correct language is expected as input, then why have any
restrictions at all? Why is the !cs needed, for example?

And what's the point of handling with a preparser things that PEG can
handle just fine? It seems that you're making the morphology harder, not
easier, to grasp by hiding some things in the preparser.

mu'o mi'e xorxes

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--001a1130c84a5a05980527b99439
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Fri, Dec 25, 2015 at 10:59 AM, Gleki Arxokuna <span dir=3D"ltr">&lt;=
<a href=3D"mailto:gleki.is.my.name@gmail.com" target=3D"_blank">gleki.is.my=
.name@gmail.com</a>&gt;</span> wrote:<blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div di=
r=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote"><span class=
=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b=
order-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:s=
olid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div cla=
ss=3D"gmail_quote"><div><pre style=3D"color:rgb(0,0,0);word-wrap:break-word=
;white-space:pre-wrap">bgv =3D [bgv] hgu

jz =3D [jz] hgu

cs =3D [cs] hgv !cs !x</pre></div></div></div></div></blockquote></span><di=
v>oops, the website wasn&#39;t updated. I will fix later. Or you can just c=
lear appcache for it.</div><div>&quot; !x&quot; isn&#39;t necessary here at=
 all. i removed it:<br><a href=3D"http://mw.lojban.org/extensions/ilmentufa=
/morfologi.js.peg" target=3D"_blank">http://mw.lojban.org/extensions/ilment=
ufa/morfologi.js.peg</a></div></div></div></div></blockquote><div><br></div=
><div>But then you allow bacxa=C2=A0</div><div>=C2=A0</div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gm=
ail_quote"><span class=3D""><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,20=
4);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D=
"gmail_extra"><div class=3D"gmail_quote"><div><pre style=3D"color:rgb(0,0,0=
);word-wrap:break-word;white-space:pre-wrap">pf =3D [pf] hgv</pre></div><di=
v><br></div><div>Unfortunately, you can&#39;t do this. The !x after cs is w=
rong because it will reject for example &quot;vasxu&quot;. But more importa=
ntly no consonant follows the same rules of any other consonant. You remove=
d the restriction against double consonants, so &quot;babba&quot; will pars=
e as a gismu.</div><div><br></div><div>The only two letters that share iden=
tical rules are e and o.</div></div></div></div></blockquote><div><br></div=
></span><div>Indeed, thanks for noticing. I need to explain this parser bet=
ter because it changes something in ideology.</div><div><br></div><div>Name=
ly, it preprocesses input using a bunch or regexes.</div><div>So {zk} turns=
 into {zyk}, {bb} into {byb} etc.</div><div>The idea is that the parser exp=
ects correct language in its input and determine word classes, but not show=
 mistakes in the input.</div></div></div></div></blockquote><div><br></div>=
<div>If only correct language is expected as input, then why have any restr=
ictions at all? Why is the !cs needed, for example?</div><div><br></div><di=
v>And what&#39;s the point of handling with a preparser things that PEG can=
 handle just fine? It seems that you&#39;re making the morphology harder, n=
ot easier, to grasp by hiding some things in the preparser.</div><div><br><=
/div><div>mu&#39;o mi&#39;e xorxes</div><div><br></div></div></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:lojban+unsubscribe@googlegroups.com">lojban+unsub=
scribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br />
Visit this group at <a href=3D"https://groups.google.com/group/lojban">http=
s://groups.google.com/group/lojban</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

--001a1130c84a5a05980527b99439--