From lojban+bncCOjSjrXVGBDVjKzmBBoEzDWFmA@googlegroups.com Fri Oct 29 10:37:39 2010 Received: from mail-gw0-f61.google.com ([74.125.83.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBstT-0001sw-MZ; Fri, 29 Oct 2010 10:37:39 -0700 Received: by gwj20 with SMTP id 20sf4764896gwj.16 for ; Fri, 29 Oct 2010 10:37:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:mime-version:received:received :in-reply-to:references:date:message-id:subject:from:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=h6FTNaXXxD9VqBqrCRQKn9gn50ojvtzBaVwTZexRmIs=; b=p5Qnl+mN40dHlWXp3+wokLMxp9DWbluxBBIdcDuTDYQIqnbUVeYdyXdKra2kBqYanN z0IolaQsDY1ubK6A7c4ZYGvB4F+tdC/Jt8rLclVtPTHqSZPex46MjdHdHa2x/oz+r/Ar IVuDb7QKMSuWGWXovkjRoMXZzmnIH+OQ56ssg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; b=pLuhbkUGr3Y930p7v5ldyUSyC/BTINEz4HUCU25Xk4x/C80l3EGYv/Zc338zLGXNaq 6E549qrQEp770jmQkxnYiDYz257tf4J8x/lKFSwu4d74rUJXJueqqPTUfNSkyAL2drBv wnzLwo1F7VybmGE6q1BaSxb1X9A+5cxLJiWGE= Received: by 10.90.60.19 with SMTP id i19mr365186aga.25.1288373845459; Fri, 29 Oct 2010 10:37:25 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.100.231.3 with SMTP id d3ls1039438anh.7.p; Fri, 29 Oct 2010 10:37:24 -0700 (PDT) Received: by 10.100.123.19 with SMTP id v19mr3591421anc.58.1288373844881; Fri, 29 Oct 2010 10:37:24 -0700 (PDT) Received: by 10.100.123.19 with SMTP id v19mr3591420anc.58.1288373844855; Fri, 29 Oct 2010 10:37:24 -0700 (PDT) Received: from mail-gw0-f46.google.com (mail-gw0-f46.google.com [74.125.83.46]) by gmr-mx.google.com with ESMTP id x32si879357ana.3.2010.10.29.10.37.23; Fri, 29 Oct 2010 10:37:23 -0700 (PDT) Received-SPF: pass (google.com: domain of lukeabergen@gmail.com designates 74.125.83.46 as permitted sender) client-ip=74.125.83.46; Received: by gwj21 with SMTP id 21so2259243gwj.33 for ; Fri, 29 Oct 2010 10:37:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.26.84 with SMTP id e20mr9748427icc.129.1288373843643; Fri, 29 Oct 2010 10:37:23 -0700 (PDT) Received: by 10.231.149.14 with HTTP; Fri, 29 Oct 2010 10:37:23 -0700 (PDT) In-Reply-To: References: <20101029170344.GB47249@alice.local> Date: Fri, 29 Oct 2010 13:37:23 -0400 Message-ID: Subject: Re: [lojban] lujvo deconstruction From: Luke Bergen To: lojban@googlegroups.com X-Original-Sender: lukeabergen@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of lukeabergen@gmail.com designates 74.125.83.46 as permitted sender) smtp.mail=lukeabergen@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=20cf303f6b2049cd1d0493c4e942 --20cf303f6b2049cd1d0493c4e942 Content-Type: text/plain; charset=ISO-8859-1 Actually I guess that was a bad example at the end because a lujvo ending with "rat" would definitely be wrong. But you get where I'm going with it. On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen wrote: > Sorry, yes, I was providing very rough pseudocode for my script. I do look > from left to right. But since rafsi are always 3 letters (minus any > ' characters and excluding 4 letter rafsi), I take them in chunks of 3. > > an example with morsi would be "xamymro". My code would go like: > grab left most three chars, check for .y'ys and grab a fourth char if there > is a .y'y > look up the rafsi, chop off what you found to be the "leftmost" rafsi and > loop again with what you have left > Now we're looking at "ymro" > Strip off "y" and we're left with "mro". Now because I'm assuming that > "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see "mro" > and think "ok, the 'm' is a buffer vowel so grab another char so we're back > to a 3 letter rafsi", I then try to grab whatever comes after "o" and get a > null-pointer or some such. > > It just occurred to me that I might deal with 4 letter rafsi by keeping in > mind that they always end with "y". So my revised "grab leftmost rafsi" > code would look something like: > > word = xajmymro > if (word = "....y") // where this is "word" = any 4 characters followed by > an "y" > return substring(word, 0, 4) > > Then in the calling function I just have to look for gismu of the form > rafsi+a, rafsi+e, etc... till I find one that matches a gismu. > > I'm still stuck on the buffer consonant problem though. > > It feels wrong to use guesswork like "if you see [r|l|m|n]C then check to > see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab > another char from the right, and look THAT up and see if it's a rafsi". > > Here's a non-code way to think of the problem. How would a parser figure > out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat ro > ci"? > > On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. < > alyn.post@lodockikumazvati.org> wrote: > >> On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: >> > When I first started learning lojban I wrote up a quick'n dirty >> script to >> > make looking up words faster and easier. gismu and cmavo were easy, >> but I >> > could never figure out lujvo. So I'm taking another stab at it. I >> > currently have something that works in the general cases of {bajdri}, >> > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with >> 4 >> > letter rafsi and non "y" buffer letters. >> > To deal with the non "y" buffer letters I thought I could just say: >> > strip all "y" from the word >> > get first three non "'" chars >> > if the first letter is "r", "l", "m", or "n" and the second letter is >> a >> > consonant, then chop off the first letter and grab another letter >> from the >> > right >> > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after >> > handling ba'u in the first iteration) end up with "rbe" and due to >> the >> > above step, I'd strip off the "r" and grab the next letter thus >> ending >> > with "bei" which is the right result). >> > But this produces strange results because there ARE cases where >> buffer >> > letters are followed by consonants (morsi for instance). >> > Is there a way to un-ambiguously and algorithmically break a lujvo >> down >> > into its component gismu? >> > >> >> I haven't rigorously looked at this, so please excuse me if I'm way >> off base. >> >> What if you start at the left side of the word and match characters >> until you get a matching rafsi, then look for optional buffer >> characters before matching your next rafsi, &c? You could be much >> more sophisticated by adding detection for valid lerfu clustering >> to throw out what would otherwise be an ambiguous case. >> >> It sounds like you're working top down on the problem rather than >> going from left to right, but I don't know what is wrong with my >> suggestion yet. >> >> I see you've provided 3 simple examples, but can you provide an >> example for morsi which you mention at the end? >> >> -Alan >> -- >> .i ko djuno fi le do sevzi >> >> -- >> You received this message because you are subscribed to the Google Groups >> "lojban" group. >> To post to this group, send email to lojban@googlegroups.com. >> To unsubscribe from this group, send email to >> lojban+unsubscribe@googlegroups.com >> . >> For more options, visit this group at >> http://groups.google.com/group/lojban?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --20cf303f6b2049cd1d0493c4e942 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Actually I guess that was a bad example at the end because a lujvo ending w= ith "rat" would definitely be wrong. =A0But you get where I'm= going with it.

On Fri, Oct 29, 2010 at 1= :34 PM, Luke Bergen <lukeabergen@gmail.com> wrote:
Sorry, yes, I was providing very rough pseu= docode for my script. =A0I do look from left to right. =A0But since rafsi a= re always 3 letters (minus any '=A0characters and excluding 4 letter ra= fsi), I take them in chunks of 3.

an example with morsi would be "xamymro". =A0My co= de would go like:
grab left most three chars, check for .y'ys= and grab a fourth char if there is a .y'y
look up the rafsi,= chop off what you found to be the "leftmost" rafsi and loop agai= n with what you have left
Now we're looking at "ymro"
Strip off "y&= quot; and we're left with "mro". =A0Now because I'm assum= ing that "r", "l", "m", or "n" foll= owed by a consonant is a buffer vowel, I see "mro" and think &quo= t;ok, the 'm' is a buffer vowel so grab another char so we're b= ack to a 3 letter rafsi", I then try to grab whatever comes after &quo= t;o" and get a null-pointer or some such.

It just occurred to me that I might deal with 4 letter = rafsi by keeping in mind that they always end with "y". =A0So my = revised "grab leftmost rafsi" code would look something like:

word =3D xajmymro
if (word =3D "....y&qu= ot;) // where this is "word" =3D any 4 characters followed by an = "y"
=A0=A0return substring(word, 0, 4)

Then in the calling function I just have to look for gismu of the form rafs= i+a, rafsi+e, etc... till I find one that matches a gismu.

I'm still stuck on the buffer consonant problem though.

It feels wrong to use guesswork like "if you see [= r|l|m|n]C then check to see if it's a valid rafsi, if it's not, str= ip off the [r|l|m|n], grab another char from the right, and look THAT up an= d see if it's a rafsi".

Here's a non-code way to think of the problem. =A0H= ow would a parser figure out whether "co'amrobratroci" is &qu= ot;co'a mro bra troci" or "co'a m rob rat ro ci"?

On Fri, Oct 29, 2010 at 1:03 PM, .alyn.= post. <alyn.post@lodockikumazvati.org> wrote:
On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen = wrote:
> =A0 =A0When I first started learning lojban I wrote up a quick'n d= irty script to
> =A0 =A0make looking up words faster and easier. gismu and cmavo were e= asy, but I
> =A0 =A0could never figure out lujvo. So I'm taking another stab at= it. I
> =A0 =A0currently have something that works in the general cases of {ba= jdri},
> =A0 =A0{ba'udri}, and {bagypau}. But currently I'm not sure ho= w to deal with 4
> =A0 =A0letter rafsi and non "y" buffer letters.
> =A0 =A0To deal with the non "y" buffer letters I thought I c= ould just say:
> =A0 =A0strip all "y" from the word
> =A0 =A0get first three non "'" chars
> =A0 =A0if the first letter is "r", "l", "m&qu= ot;, or "n" and the second letter is a
> =A0 =A0consonant, then chop off the first letter and grab another lett= er from the
> =A0 =A0right
> =A0 =A0(so if I was parsing "bacru zei bevri" =3D "ba&#= 39;urbei" I would (after
> =A0 =A0handling ba'u in the first iteration) end up with "rbe= " and due to the
> =A0 =A0above step, I'd strip off the "r" and grab the ne= xt letter thus ending
> =A0 =A0with "bei" which is the right result).
> =A0 =A0But this produces strange results because there ARE cases where= buffer
> =A0 =A0letters are followed by consonants (morsi for instance).
> =A0 =A0Is there a way to un-ambiguously and algorithmically break a lu= jvo down
> =A0 =A0into its component gismu?
>

I haven't rigorously looked at this, so please excuse me if= I'm way
off base.

What if you start at the left side of the word and match characters
until you get a matching rafsi, then look for optional buffer
characters before matching your next rafsi, &c? =A0You could be much more sophisticated by adding detection for valid lerfu clustering
to throw out what would otherwise be an ambiguous case.

It sounds like you're working top down on the problem rather than
going from left to right, but I don't know what is wrong with my
suggestion yet.

I see you've provided 3 simple examples, but can you provide an
example for morsi which you mention at the end?

-Alan
--
.i ko djuno fi le do sevzi

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojba= n?hl=3Den.



--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--20cf303f6b2049cd1d0493c4e942--