From lojban+bncCOjSjrXVGBDVjKzmBBoEzDWFmA@googlegroups.com Fri Oct 29 10:37:39 2010
Received: from mail-gw0-f61.google.com ([74.125.83.61])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCOjSjrXVGBDVjKzmBBoEzDWFmA@googlegroups.com>)
	id 1PBstT-0001sw-MZ; Fri, 29 Oct 2010 10:37:39 -0700
Received: by gwj20 with SMTP id 20sf4764896gwj.16
        for <multiple recipients>; Fri, 29 Oct 2010 10:37:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:received:received
         :received:received-spf:received:mime-version:received:received
         :in-reply-to:references:date:message-id:subject:from:to
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe:content-type;
        bh=h6FTNaXXxD9VqBqrCRQKn9gn50ojvtzBaVwTZexRmIs=;
        b=p5Qnl+mN40dHlWXp3+wokLMxp9DWbluxBBIdcDuTDYQIqnbUVeYdyXdKra2kBqYanN
         z0IolaQsDY1ubK6A7c4ZYGvB4F+tdC/Jt8rLclVtPTHqSZPex46MjdHdHa2x/oz+r/Ar
         IVuDb7QKMSuWGWXovkjRoMXZzmnIH+OQ56ssg=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:received-spf:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:x-original-sender
         :x-original-authentication-results:reply-to:precedence:mailing-list
         :list-id:list-post:list-help:list-archive:sender:list-subscribe
         :list-unsubscribe:content-type;
        b=pLuhbkUGr3Y930p7v5ldyUSyC/BTINEz4HUCU25Xk4x/C80l3EGYv/Zc338zLGXNaq
         6E549qrQEp770jmQkxnYiDYz257tf4J8x/lKFSwu4d74rUJXJueqqPTUfNSkyAL2drBv
         wnzLwo1F7VybmGE6q1BaSxb1X9A+5cxLJiWGE=
Received: by 10.90.60.19 with SMTP id i19mr365186aga.25.1288373845459;
        Fri, 29 Oct 2010 10:37:25 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.100.231.3 with SMTP id d3ls1039438anh.7.p; Fri, 29 Oct 2010
 10:37:24 -0700 (PDT)
Received: by 10.100.123.19 with SMTP id v19mr3591421anc.58.1288373844881;
        Fri, 29 Oct 2010 10:37:24 -0700 (PDT)
Received: by 10.100.123.19 with SMTP id v19mr3591420anc.58.1288373844855;
        Fri, 29 Oct 2010 10:37:24 -0700 (PDT)
Received: from mail-gw0-f46.google.com (mail-gw0-f46.google.com [74.125.83.46])
        by gmr-mx.google.com with ESMTP id x32si879357ana.3.2010.10.29.10.37.23;
        Fri, 29 Oct 2010 10:37:23 -0700 (PDT)
Received-SPF: pass (google.com: domain of lukeabergen@gmail.com designates 74.125.83.46 as permitted sender) client-ip=74.125.83.46;
Received: by gwj21 with SMTP id 21so2259243gwj.33
        for <lojban@googlegroups.com>; Fri, 29 Oct 2010 10:37:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.42.26.84 with SMTP id e20mr9748427icc.129.1288373843643; Fri,
 29 Oct 2010 10:37:23 -0700 (PDT)
Received: by 10.231.149.14 with HTTP; Fri, 29 Oct 2010 10:37:23 -0700 (PDT)
In-Reply-To: <AANLkTimEdWEmcwzgGm6=Fq3tgguQ1K_0uff7MKb5aZLU@mail.gmail.com>
References: <AANLkTik2apwYUT40-wMWcd_Wjj4B4aERKNsHVq_MCf=P@mail.gmail.com>
	<20101029170344.GB47249@alice.local>
	<AANLkTimEdWEmcwzgGm6=Fq3tgguQ1K_0uff7MKb5aZLU@mail.gmail.com>
Date: Fri, 29 Oct 2010 13:37:23 -0400
Message-ID: <AANLkTim4OyJoDtdJz_gopRdJrtg-4oYgZ1MgMBp0MLD+@mail.gmail.com>
Subject: Re: [lojban] lujvo deconstruction
From: Luke Bergen <lukeabergen@gmail.com>
To: lojban@googlegroups.com
X-Original-Sender: lukeabergen@gmail.com
X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com:
 domain of lukeabergen@gmail.com designates 74.125.83.46 as permitted sender)
 smtp.mail=lukeabergen@gmail.com; dkim=pass (test mode) header.i=@gmail.com
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: multipart/alternative; boundary=20cf303f6b2049cd1d0493c4e942

--20cf303f6b2049cd1d0493c4e942
Content-Type: text/plain; charset=ISO-8859-1

Actually I guess that was a bad example at the end because a lujvo ending
with "rat" would definitely be wrong.  But you get where I'm going with it.

On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <lukeabergen@gmail.com> wrote:

> Sorry, yes, I was providing very rough pseudocode for my script.  I do look
> from left to right.  But since rafsi are always 3 letters (minus any
> ' characters and excluding 4 letter rafsi), I take them in chunks of 3.
>
> an example with morsi would be "xamymro".  My code would go like:
> grab left most three chars, check for .y'ys and grab a fourth char if there
> is a .y'y
> look up the rafsi, chop off what you found to be the "leftmost" rafsi and
> loop again with what you have left
> Now we're looking at "ymro"
> Strip off "y" and we're left with "mro".  Now because I'm assuming that
> "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see "mro"
> and think "ok, the 'm' is a buffer vowel so grab another char so we're back
> to a 3 letter rafsi", I then try to grab whatever comes after "o" and get a
> null-pointer or some such.
>
> It just occurred to me that I might deal with 4 letter rafsi by keeping in
> mind that they always end with "y".  So my revised "grab leftmost rafsi"
> code would look something like:
>
> word = xajmymro
> if (word = "....y") // where this is "word" = any 4 characters followed by
> an "y"
>   return substring(word, 0, 4)
>
> Then in the calling function I just have to look for gismu of the form
> rafsi+a, rafsi+e, etc... till I find one that matches a gismu.
>
> I'm still stuck on the buffer consonant problem though.
>
> It feels wrong to use guesswork like "if you see [r|l|m|n]C then check to
> see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab
> another char from the right, and look THAT up and see if it's a rafsi".
>
> Here's a non-code way to think of the problem.  How would a parser figure
> out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat ro
> ci"?
>
> On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. <
> alyn.post@lodockikumazvati.org> wrote:
>
>> On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
>> >    When I first started learning lojban I wrote up a quick'n dirty
>> script to
>> >    make looking up words faster and easier. gismu and cmavo were easy,
>> but I
>> >    could never figure out lujvo. So I'm taking another stab at it. I
>> >    currently have something that works in the general cases of {bajdri},
>> >    {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with
>> 4
>> >    letter rafsi and non "y" buffer letters.
>> >    To deal with the non "y" buffer letters I thought I could just say:
>> >    strip all "y" from the word
>> >    get first three non "'" chars
>> >    if the first letter is "r", "l", "m", or "n" and the second letter is
>> a
>> >    consonant, then chop off the first letter and grab another letter
>> from the
>> >    right
>> >    (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after
>> >    handling ba'u in the first iteration) end up with "rbe" and due to
>> the
>> >    above step, I'd strip off the "r" and grab the next letter thus
>> ending
>> >    with "bei" which is the right result).
>> >    But this produces strange results because there ARE cases where
>> buffer
>> >    letters are followed by consonants (morsi for instance).
>> >    Is there a way to un-ambiguously and algorithmically break a lujvo
>> down
>> >    into its component gismu?
>> >
>>
>> I haven't rigorously looked at this, so please excuse me if I'm way
>> off base.
>>
>> What if you start at the left side of the word and match characters
>> until you get a matching rafsi, then look for optional buffer
>> characters before matching your next rafsi, &c?  You could be much
>> more sophisticated by adding detection for valid lerfu clustering
>> to throw out what would otherwise be an ambiguous case.
>>
>> It sounds like you're working top down on the problem rather than
>> going from left to right, but I don't know what is wrong with my
>> suggestion yet.
>>
>> I see you've provided 3 simple examples, but can you provide an
>> example for morsi which you mention at the end?
>>
>> -Alan
>> --
>> .i ko djuno fi le do sevzi
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "lojban" group.
>> To post to this group, send email to lojban@googlegroups.com.
>> To unsubscribe from this group, send email to
>> lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/lojban?hl=en.
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.


--20cf303f6b2049cd1d0493c4e942
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Actually I guess that was a bad example at the end because a lujvo ending w=
ith &quot;rat&quot; would definitely be wrong. =A0But you get where I&#39;m=
 going with it.<br><br><div class=3D"gmail_quote">On Fri, Oct 29, 2010 at 1=
:34 PM, Luke Bergen <span dir=3D"ltr">&lt;<a href=3D"mailto:lukeabergen@gma=
il.com">lukeabergen@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">Sorry, yes, I was providing very rough pseu=
docode for my script. =A0I do look from left to right. =A0But since rafsi a=
re always 3 letters (minus any &#39;=A0characters and excluding 4 letter ra=
fsi), I take them in chunks of 3.<div>

<br></div><div>an example with morsi would be &quot;xamymro&quot;. =A0My co=
de would go like:</div><div>grab left most three chars, check for .y&#39;ys=
 and grab a fourth char if there is a .y&#39;y</div><div>look up the rafsi,=
 chop off what you found to be the &quot;leftmost&quot; rafsi and loop agai=
n with what you have left</div>

<div>Now we&#39;re looking at &quot;ymro&quot;</div><div>Strip off &quot;y&=
quot; and we&#39;re left with &quot;mro&quot;. =A0Now because I&#39;m assum=
ing that &quot;r&quot;, &quot;l&quot;, &quot;m&quot;, or &quot;n&quot; foll=
owed by a consonant is a buffer vowel, I see &quot;mro&quot; and think &quo=
t;ok, the &#39;m&#39; is a buffer vowel so grab another char so we&#39;re b=
ack to a 3 letter rafsi&quot;, I then try to grab whatever comes after &quo=
t;o&quot; and get a null-pointer or some such.</div>

<div><br></div><div>It just occurred to me that I might deal with 4 letter =
rafsi by keeping in mind that they always end with &quot;y&quot;. =A0So my =
revised &quot;grab leftmost rafsi&quot; code would look something like:</di=
v>

<div><br></div><div>word =3D xajmymro</div><div>if (word =3D &quot;....y&qu=
ot;) // where this is &quot;word&quot; =3D any 4 characters followed by an =
&quot;y&quot;</div><div>=A0=A0return substring(word, 0, 4)</div><div><br></=
div><div>

Then in the calling function I just have to look for gismu of the form rafs=
i+a, rafsi+e, etc... till I find one that matches a gismu.</div><div><br></=
div><div>I&#39;m still stuck on the buffer consonant problem though.</div>

<div><br></div><div>It feels wrong to use guesswork like &quot;if you see [=
r|l|m|n]C then check to see if it&#39;s a valid rafsi, if it&#39;s not, str=
ip off the [r|l|m|n], grab another char from the right, and look THAT up an=
d see if it&#39;s a rafsi&quot;.</div>

<div><br></div><div>Here&#39;s a non-code way to think of the problem. =A0H=
ow would a parser figure out whether &quot;co&#39;amrobratroci&quot; is &qu=
ot;co&#39;a mro bra troci&quot; or &quot;co&#39;a m rob rat ro ci&quot;?</d=
iv>
<div><div></div><div class=3D"h5">
<div><br><div class=3D"gmail_quote">On Fri, Oct 29, 2010 at 1:03 PM, .alyn.=
post. <span dir=3D"ltr">&lt;<a href=3D"mailto:alyn.post@lodockikumazvati.or=
g" target=3D"_blank">alyn.post@lodockikumazvati.org</a>&gt;</span> wrote:<b=
r><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">

<div><div></div><div>On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen =
wrote:<br>
&gt; =A0 =A0When I first started learning lojban I wrote up a quick&#39;n d=
irty script to<br>
&gt; =A0 =A0make looking up words faster and easier. gismu and cmavo were e=
asy, but I<br>
&gt; =A0 =A0could never figure out lujvo. So I&#39;m taking another stab at=
 it. I<br>
&gt; =A0 =A0currently have something that works in the general cases of {ba=
jdri},<br>
&gt; =A0 =A0{ba&#39;udri}, and {bagypau}. But currently I&#39;m not sure ho=
w to deal with 4<br>
&gt; =A0 =A0letter rafsi and non &quot;y&quot; buffer letters.<br>
&gt; =A0 =A0To deal with the non &quot;y&quot; buffer letters I thought I c=
ould just say:<br>
&gt; =A0 =A0strip all &quot;y&quot; from the word<br>
&gt; =A0 =A0get first three non &quot;&#39;&quot; chars<br>
&gt; =A0 =A0if the first letter is &quot;r&quot;, &quot;l&quot;, &quot;m&qu=
ot;, or &quot;n&quot; and the second letter is a<br>
&gt; =A0 =A0consonant, then chop off the first letter and grab another lett=
er from the<br>
&gt; =A0 =A0right<br>
&gt; =A0 =A0(so if I was parsing &quot;bacru zei bevri&quot; =3D &quot;ba&#=
39;urbei&quot; I would (after<br>
&gt; =A0 =A0handling ba&#39;u in the first iteration) end up with &quot;rbe=
&quot; and due to the<br>
&gt; =A0 =A0above step, I&#39;d strip off the &quot;r&quot; and grab the ne=
xt letter thus ending<br>
&gt; =A0 =A0with &quot;bei&quot; which is the right result).<br>
&gt; =A0 =A0But this produces strange results because there ARE cases where=
 buffer<br>
&gt; =A0 =A0letters are followed by consonants (morsi for instance).<br>
&gt; =A0 =A0Is there a way to un-ambiguously and algorithmically break a lu=
jvo down<br>
&gt; =A0 =A0into its component gismu?<br>
&gt;<br>
<br>
</div></div>I haven&#39;t rigorously looked at this, so please excuse me if=
 I&#39;m way<br>
off base.<br>
<br>
What if you start at the left side of the word and match characters<br>
until you get a matching rafsi, then look for optional buffer<br>
characters before matching your next rafsi, &amp;c? =A0You could be much<br=
>
more sophisticated by adding detection for valid lerfu clustering<br>
to throw out what would otherwise be an ambiguous case.<br>
<br>
It sounds like you&#39;re working top down on the problem rather than<br>
going from left to right, but I don&#39;t know what is wrong with my<br>
suggestion yet.<br>
<br>
I see you&#39;ve provided 3 simple examples, but can you provide an<br>
example for morsi which you mention at the end?<br>
<br>
-Alan<br>
--<br>
.i ko djuno fi le do sevzi<br>
<font color=3D"#888888"><br>
--<br>
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br>
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com" target=3D"_blank">lojban@googlegroups.com</a>.<br>
To unsubscribe from this group, send email to <a href=3D"mailto:lojban%2Bun=
subscribe@googlegroups.com" target=3D"_blank">lojban+unsubscribe@googlegrou=
ps.com</a>.<br>
For more options, visit this group at <a href=3D"http://groups.google.com/g=
roup/lojban?hl=3Den" target=3D"_blank">http://groups.google.com/group/lojba=
n?hl=3Den</a>.<br>
<br>
</font></blockquote></div><br></div>
</div></div></blockquote></div><br>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "=
lojban" group.<br />
To post to this group, send email to lojban@googlegroups.com.<br />
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.<br />

For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.<br />



--20cf303f6b2049cd1d0493c4e942--