From lojban+bncCOjSjrXVGBDDi6zmBBoEvI1cFg@googlegroups.com Fri Oct 29 10:35:18 2010
Received: from mail-yw0-f61.google.com ([209.85.213.61])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCOjSjrXVGBDDi6zmBBoEvI1cFg@googlegroups.com>)
	id 1PBsrB-0001q7-Ki; Fri, 29 Oct 2010 10:35:17 -0700
Received: by ywk9 with SMTP id 9sf3620371ywk.16
        for <multiple recipients>; Fri, 29 Oct 2010 10:35:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:received:received
         :received:received-spf:received:mime-version:received:received
         :in-reply-to:references:date:message-id:subject:from:to
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe:content-type;
        bh=ZGFujzuIorVmEDLNdN8JAcXcCtJJYJ7jekr7D2lUVH0=;
        b=H+lyDkViXYvnIWoaf+HggEWJBguirSLay/SaBdudYZ3oxJ7lfCGISsHFuJ39cdmDwQ
         rmlE2oR1dAaPbvkKWvYj4P+f6SQqTHBXxXPfteTZLYiSQndJBnnim+r6HW+WuX7ATZbQ
         ky/1OnsRdL9Sm2rEyHQq5MRpKr7Stgp2Smnfo=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:received-spf:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:x-original-sender
         :x-original-authentication-results:reply-to:precedence:mailing-list
         :list-id:list-post:list-help:list-archive:sender:list-subscribe
         :list-unsubscribe:content-type;
        b=6X5hrJmG+uljP0ISRDYabxv2ozsSbuxPN3UdP0ckT6eqNXi39S5ZNex2dvLix/m4wA
         PfGmtFZf4w1V6SdiFaubCxpfs0w083fjKPftD4w+H4uqz9EWwn5VB7UJ/TSKCZ1lUdG6
         2788kBqjcRl2JCi8Wa51W+1fQPG97TjcZ09as=
Received: by 10.150.171.11 with SMTP id t11mr1906032ybe.5.1288373699773;
        Fri, 29 Oct 2010 10:34:59 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.231.112.41 with SMTP id u41ls3086286ibp.1.p; Fri, 29 Oct 2010
 10:34:58 -0700 (PDT)
Received: by 10.231.149.83 with SMTP id s19mr3649804ibv.2.1288373698079;
        Fri, 29 Oct 2010 10:34:58 -0700 (PDT)
Received: by 10.231.149.83 with SMTP id s19mr3649803ibv.2.1288373698023;
        Fri, 29 Oct 2010 10:34:58 -0700 (PDT)
Received: from mail-iw0-f177.google.com (mail-iw0-f177.google.com [209.85.214.177])
        by gmr-mx.google.com with ESMTP id j25si3459193ibb.4.2010.10.29.10.34.57;
        Fri, 29 Oct 2010 10:34:57 -0700 (PDT)
Received-SPF: pass (google.com: domain of lukeabergen@gmail.com designates 209.85.214.177 as permitted sender) client-ip=209.85.214.177;
Received: by iwn8 with SMTP id 8so3732247iwn.8
        for <lojban@googlegroups.com>; Fri, 29 Oct 2010 10:34:56 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.36.11 with SMTP id r11mr781420ibd.58.1288373696768; Fri,
 29 Oct 2010 10:34:56 -0700 (PDT)
Received: by 10.231.149.14 with HTTP; Fri, 29 Oct 2010 10:34:56 -0700 (PDT)
In-Reply-To: <20101029170344.GB47249@alice.local>
References: <AANLkTik2apwYUT40-wMWcd_Wjj4B4aERKNsHVq_MCf=P@mail.gmail.com>
	<20101029170344.GB47249@alice.local>
Date: Fri, 29 Oct 2010 13:34:56 -0400
Message-ID: <AANLkTimEdWEmcwzgGm6=Fq3tgguQ1K_0uff7MKb5aZLU@mail.gmail.com>
Subject: Re: [lojban] lujvo deconstruction
From: Luke Bergen <lukeabergen@gmail.com>
To: lojban@googlegroups.com
X-Original-Sender: lukeabergen@gmail.com
X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com:
 domain of lukeabergen@gmail.com designates 209.85.214.177 as permitted
 sender) smtp.mail=lukeabergen@gmail.com; dkim=pass (test mode) header.i=@gmail.com
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: multipart/alternative; boundary=000325550e5a88b6630493c4e025

--000325550e5a88b6630493c4e025
Content-Type: text/plain; charset=ISO-8859-1

Sorry, yes, I was providing very rough pseudocode for my script.  I do look
from left to right.  But since rafsi are always 3 letters (minus any
' characters and excluding 4 letter rafsi), I take them in chunks of 3.

an example with morsi would be "xamymro".  My code would go like:
grab left most three chars, check for .y'ys and grab a fourth char if there
is a .y'y
look up the rafsi, chop off what you found to be the "leftmost" rafsi and
loop again with what you have left
Now we're looking at "ymro"
Strip off "y" and we're left with "mro".  Now because I'm assuming that "r",
"l", "m", or "n" followed by a consonant is a buffer vowel, I see "mro" and
think "ok, the 'm' is a buffer vowel so grab another char so we're back to a
3 letter rafsi", I then try to grab whatever comes after "o" and get a
null-pointer or some such.

It just occurred to me that I might deal with 4 letter rafsi by keeping in
mind that they always end with "y".  So my revised "grab leftmost rafsi"
code would look something like:

word = xajmymro
if (word = "....y") // where this is "word" = any 4 characters followed by
an "y"
  return substring(word, 0, 4)

Then in the calling function I just have to look for gismu of the form
rafsi+a, rafsi+e, etc... till I find one that matches a gismu.

I'm still stuck on the buffer consonant problem though.

It feels wrong to use guesswork like "if you see [r|l|m|n]C then check to
see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab
another char from the right, and look THAT up and see if it's a rafsi".

Here's a non-code way to think of the problem.  How would a parser figure
out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat ro
ci"?

On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. <alyn.post@lodockikumazvati.org
> wrote:

> On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
> >    When I first started learning lojban I wrote up a quick'n dirty script
> to
> >    make looking up words faster and easier. gismu and cmavo were easy,
> but I
> >    could never figure out lujvo. So I'm taking another stab at it. I
> >    currently have something that works in the general cases of {bajdri},
> >    {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with
> 4
> >    letter rafsi and non "y" buffer letters.
> >    To deal with the non "y" buffer letters I thought I could just say:
> >    strip all "y" from the word
> >    get first three non "'" chars
> >    if the first letter is "r", "l", "m", or "n" and the second letter is
> a
> >    consonant, then chop off the first letter and grab another letter from
> the
> >    right
> >    (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after
> >    handling ba'u in the first iteration) end up with "rbe" and due to the
> >    above step, I'd strip off the "r" and grab the next letter thus ending
> >    with "bei" which is the right result).
> >    But this produces strange results because there ARE cases where buffer
> >    letters are followed by consonants (morsi for instance).
> >    Is there a way to un-ambiguously and algorithmically break a lujvo
> down
> >    into its component gismu?
> >
>
> I haven't rigorously looked at this, so please excuse me if I'm way
> off base.
>
> What if you start at the left side of the word and match characters
> until you get a matching rafsi, then look for optional buffer
> characters before matching your next rafsi, &c?  You could be much
> more sophisticated by adding detection for valid lerfu clustering
> to throw out what would otherwise be an ambiguous case.
>
> It sounds like you're working top down on the problem rather than
> going from left to right, but I don't know what is wrong with my
> suggestion yet.
>
> I see you've provided 3 simple examples, but can you provide an
> example for morsi which you mention at the end?
>
> -Alan
> --
> .i ko djuno fi le do sevzi
>
> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To post to this group, send email to lojban@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.


--000325550e5a88b6630493c4e025
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Sorry, yes, I was providing very rough pseudocode for my script. =A0I do lo=
ok from left to right. =A0But since rafsi are always 3 letters (minus any &=
#39;=A0characters and excluding 4 letter rafsi), I take them in chunks of 3=
.<div>
<br></div><div>an example with morsi would be &quot;xamymro&quot;. =A0My co=
de would go like:</div><div>grab left most three chars, check for .y&#39;ys=
 and grab a fourth char if there is a .y&#39;y</div><div>look up the rafsi,=
 chop off what you found to be the &quot;leftmost&quot; rafsi and loop agai=
n with what you have left</div>
<div>Now we&#39;re looking at &quot;ymro&quot;</div><div>Strip off &quot;y&=
quot; and we&#39;re left with &quot;mro&quot;. =A0Now because I&#39;m assum=
ing that &quot;r&quot;, &quot;l&quot;, &quot;m&quot;, or &quot;n&quot; foll=
owed by a consonant is a buffer vowel, I see &quot;mro&quot; and think &quo=
t;ok, the &#39;m&#39; is a buffer vowel so grab another char so we&#39;re b=
ack to a 3 letter rafsi&quot;, I then try to grab whatever comes after &quo=
t;o&quot; and get a null-pointer or some such.</div>
<div><br></div><div>It just occurred to me that I might deal with 4 letter =
rafsi by keeping in mind that they always end with &quot;y&quot;. =A0So my =
revised &quot;grab leftmost rafsi&quot; code would look something like:</di=
v>
<div><br></div><div>word =3D xajmymro</div><div>if (word =3D &quot;....y&qu=
ot;) // where this is &quot;word&quot; =3D any 4 characters followed by an =
&quot;y&quot;</div><div>=A0=A0return substring(word, 0, 4)</div><div><br></=
div><div>
Then in the calling function I just have to look for gismu of the form rafs=
i+a, rafsi+e, etc... till I find one that matches a gismu.</div><div><br></=
div><div>I&#39;m still stuck on the buffer consonant problem though.</div>
<div><br></div><div>It feels wrong to use guesswork like &quot;if you see [=
r|l|m|n]C then check to see if it&#39;s a valid rafsi, if it&#39;s not, str=
ip off the [r|l|m|n], grab another char from the right, and look THAT up an=
d see if it&#39;s a rafsi&quot;.</div>
<div><br></div><div>Here&#39;s a non-code way to think of the problem. =A0H=
ow would a parser figure out whether &quot;co&#39;amrobratroci&quot; is &qu=
ot;co&#39;a mro bra troci&quot; or &quot;co&#39;a m rob rat ro ci&quot;?</d=
iv>
<div><br><div class=3D"gmail_quote">On Fri, Oct 29, 2010 at 1:03 PM, .alyn.=
post. <span dir=3D"ltr">&lt;<a href=3D"mailto:alyn.post@lodockikumazvati.or=
g">alyn.post@lodockikumazvati.org</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex;">
<div><div></div><div class=3D"h5">On Fri, Oct 29, 2010 at 12:08:09PM -0400,=
 Luke Bergen wrote:<br>
&gt; =A0 =A0When I first started learning lojban I wrote up a quick&#39;n d=
irty script to<br>
&gt; =A0 =A0make looking up words faster and easier. gismu and cmavo were e=
asy, but I<br>
&gt; =A0 =A0could never figure out lujvo. So I&#39;m taking another stab at=
 it. I<br>
&gt; =A0 =A0currently have something that works in the general cases of {ba=
jdri},<br>
&gt; =A0 =A0{ba&#39;udri}, and {bagypau}. But currently I&#39;m not sure ho=
w to deal with 4<br>
&gt; =A0 =A0letter rafsi and non &quot;y&quot; buffer letters.<br>
&gt; =A0 =A0To deal with the non &quot;y&quot; buffer letters I thought I c=
ould just say:<br>
&gt; =A0 =A0strip all &quot;y&quot; from the word<br>
&gt; =A0 =A0get first three non &quot;&#39;&quot; chars<br>
&gt; =A0 =A0if the first letter is &quot;r&quot;, &quot;l&quot;, &quot;m&qu=
ot;, or &quot;n&quot; and the second letter is a<br>
&gt; =A0 =A0consonant, then chop off the first letter and grab another lett=
er from the<br>
&gt; =A0 =A0right<br>
&gt; =A0 =A0(so if I was parsing &quot;bacru zei bevri&quot; =3D &quot;ba&#=
39;urbei&quot; I would (after<br>
&gt; =A0 =A0handling ba&#39;u in the first iteration) end up with &quot;rbe=
&quot; and due to the<br>
&gt; =A0 =A0above step, I&#39;d strip off the &quot;r&quot; and grab the ne=
xt letter thus ending<br>
&gt; =A0 =A0with &quot;bei&quot; which is the right result).<br>
&gt; =A0 =A0But this produces strange results because there ARE cases where=
 buffer<br>
&gt; =A0 =A0letters are followed by consonants (morsi for instance).<br>
&gt; =A0 =A0Is there a way to un-ambiguously and algorithmically break a lu=
jvo down<br>
&gt; =A0 =A0into its component gismu?<br>
&gt;<br>
<br>
</div></div>I haven&#39;t rigorously looked at this, so please excuse me if=
 I&#39;m way<br>
off base.<br>
<br>
What if you start at the left side of the word and match characters<br>
until you get a matching rafsi, then look for optional buffer<br>
characters before matching your next rafsi, &amp;c? =A0You could be much<br=
>
more sophisticated by adding detection for valid lerfu clustering<br>
to throw out what would otherwise be an ambiguous case.<br>
<br>
It sounds like you&#39;re working top down on the problem rather than<br>
going from left to right, but I don&#39;t know what is wrong with my<br>
suggestion yet.<br>
<br>
I see you&#39;ve provided 3 simple examples, but can you provide an<br>
example for morsi which you mention at the end?<br>
<br>
-Alan<br>
--<br>
.i ko djuno fi le do sevzi<br>
<font color=3D"#888888"><br>
--<br>
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br>
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br>
To unsubscribe from this group, send email to <a href=3D"mailto:lojban%2Bun=
subscribe@googlegroups.com">lojban+unsubscribe@googlegroups.com</a>.<br>
For more options, visit this group at <a href=3D"http://groups.google.com/g=
roup/lojban?hl=3Den" target=3D"_blank">http://groups.google.com/group/lojba=
n?hl=3Den</a>.<br>
<br>
</font></blockquote></div><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "=
lojban" group.<br />
To post to this group, send email to lojban@googlegroups.com.<br />
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.<br />

For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.<br />



--000325550e5a88b6630493c4e025--