From lojban+bncCOjSjrXVGBDDi6zmBBoEvI1cFg@googlegroups.com Fri Oct 29 10:35:18 2010 Received: from mail-yw0-f61.google.com ([209.85.213.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBsrB-0001q7-Ki; Fri, 29 Oct 2010 10:35:17 -0700 Received: by ywk9 with SMTP id 9sf3620371ywk.16 for ; Fri, 29 Oct 2010 10:35:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:mime-version:received:received :in-reply-to:references:date:message-id:subject:from:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=ZGFujzuIorVmEDLNdN8JAcXcCtJJYJ7jekr7D2lUVH0=; b=H+lyDkViXYvnIWoaf+HggEWJBguirSLay/SaBdudYZ3oxJ7lfCGISsHFuJ39cdmDwQ rmlE2oR1dAaPbvkKWvYj4P+f6SQqTHBXxXPfteTZLYiSQndJBnnim+r6HW+WuX7ATZbQ ky/1OnsRdL9Sm2rEyHQq5MRpKr7Stgp2Smnfo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; b=6X5hrJmG+uljP0ISRDYabxv2ozsSbuxPN3UdP0ckT6eqNXi39S5ZNex2dvLix/m4wA PfGmtFZf4w1V6SdiFaubCxpfs0w083fjKPftD4w+H4uqz9EWwn5VB7UJ/TSKCZ1lUdG6 2788kBqjcRl2JCi8Wa51W+1fQPG97TjcZ09as= Received: by 10.150.171.11 with SMTP id t11mr1906032ybe.5.1288373699773; Fri, 29 Oct 2010 10:34:59 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.231.112.41 with SMTP id u41ls3086286ibp.1.p; Fri, 29 Oct 2010 10:34:58 -0700 (PDT) Received: by 10.231.149.83 with SMTP id s19mr3649804ibv.2.1288373698079; Fri, 29 Oct 2010 10:34:58 -0700 (PDT) Received: by 10.231.149.83 with SMTP id s19mr3649803ibv.2.1288373698023; Fri, 29 Oct 2010 10:34:58 -0700 (PDT) Received: from mail-iw0-f177.google.com (mail-iw0-f177.google.com [209.85.214.177]) by gmr-mx.google.com with ESMTP id j25si3459193ibb.4.2010.10.29.10.34.57; Fri, 29 Oct 2010 10:34:57 -0700 (PDT) Received-SPF: pass (google.com: domain of lukeabergen@gmail.com designates 209.85.214.177 as permitted sender) client-ip=209.85.214.177; Received: by iwn8 with SMTP id 8so3732247iwn.8 for ; Fri, 29 Oct 2010 10:34:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.36.11 with SMTP id r11mr781420ibd.58.1288373696768; Fri, 29 Oct 2010 10:34:56 -0700 (PDT) Received: by 10.231.149.14 with HTTP; Fri, 29 Oct 2010 10:34:56 -0700 (PDT) In-Reply-To: <20101029170344.GB47249@alice.local> References: <20101029170344.GB47249@alice.local> Date: Fri, 29 Oct 2010 13:34:56 -0400 Message-ID: Subject: Re: [lojban] lujvo deconstruction From: Luke Bergen To: lojban@googlegroups.com X-Original-Sender: lukeabergen@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of lukeabergen@gmail.com designates 209.85.214.177 as permitted sender) smtp.mail=lukeabergen@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=000325550e5a88b6630493c4e025 --000325550e5a88b6630493c4e025 Content-Type: text/plain; charset=ISO-8859-1 Sorry, yes, I was providing very rough pseudocode for my script. I do look from left to right. But since rafsi are always 3 letters (minus any ' characters and excluding 4 letter rafsi), I take them in chunks of 3. an example with morsi would be "xamymro". My code would go like: grab left most three chars, check for .y'ys and grab a fourth char if there is a .y'y look up the rafsi, chop off what you found to be the "leftmost" rafsi and loop again with what you have left Now we're looking at "ymro" Strip off "y" and we're left with "mro". Now because I'm assuming that "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see "mro" and think "ok, the 'm' is a buffer vowel so grab another char so we're back to a 3 letter rafsi", I then try to grab whatever comes after "o" and get a null-pointer or some such. It just occurred to me that I might deal with 4 letter rafsi by keeping in mind that they always end with "y". So my revised "grab leftmost rafsi" code would look something like: word = xajmymro if (word = "....y") // where this is "word" = any 4 characters followed by an "y" return substring(word, 0, 4) Then in the calling function I just have to look for gismu of the form rafsi+a, rafsi+e, etc... till I find one that matches a gismu. I'm still stuck on the buffer consonant problem though. It feels wrong to use guesswork like "if you see [r|l|m|n]C then check to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab another char from the right, and look THAT up and see if it's a rafsi". Here's a non-code way to think of the problem. How would a parser figure out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat ro ci"? On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. wrote: > On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: > > When I first started learning lojban I wrote up a quick'n dirty script > to > > make looking up words faster and easier. gismu and cmavo were easy, > but I > > could never figure out lujvo. So I'm taking another stab at it. I > > currently have something that works in the general cases of {bajdri}, > > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with > 4 > > letter rafsi and non "y" buffer letters. > > To deal with the non "y" buffer letters I thought I could just say: > > strip all "y" from the word > > get first three non "'" chars > > if the first letter is "r", "l", "m", or "n" and the second letter is > a > > consonant, then chop off the first letter and grab another letter from > the > > right > > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after > > handling ba'u in the first iteration) end up with "rbe" and due to the > > above step, I'd strip off the "r" and grab the next letter thus ending > > with "bei" which is the right result). > > But this produces strange results because there ARE cases where buffer > > letters are followed by consonants (morsi for instance). > > Is there a way to un-ambiguously and algorithmically break a lujvo > down > > into its component gismu? > > > > I haven't rigorously looked at this, so please excuse me if I'm way > off base. > > What if you start at the left side of the word and match characters > until you get a matching rafsi, then look for optional buffer > characters before matching your next rafsi, &c? You could be much > more sophisticated by adding detection for valid lerfu clustering > to throw out what would otherwise be an ambiguous case. > > It sounds like you're working top down on the problem rather than > going from left to right, but I don't know what is wrong with my > suggestion yet. > > I see you've provided 3 simple examples, but can you provide an > example for morsi which you mention at the end? > > -Alan > -- > .i ko djuno fi le do sevzi > > -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To post to this group, send email to lojban@googlegroups.com. > To unsubscribe from this group, send email to > lojban+unsubscribe@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/lojban?hl=en. > > -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --000325550e5a88b6630493c4e025 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sorry, yes, I was providing very rough pseudocode for my script. =A0I do lo= ok from left to right. =A0But since rafsi are always 3 letters (minus any &= #39;=A0characters and excluding 4 letter rafsi), I take them in chunks of 3= .

an example with morsi would be "xamymro". =A0My co= de would go like:
grab left most three chars, check for .y'ys= and grab a fourth char if there is a .y'y
look up the rafsi,= chop off what you found to be the "leftmost" rafsi and loop agai= n with what you have left
Now we're looking at "ymro"
Strip off "y&= quot; and we're left with "mro". =A0Now because I'm assum= ing that "r", "l", "m", or "n" foll= owed by a consonant is a buffer vowel, I see "mro" and think &quo= t;ok, the 'm' is a buffer vowel so grab another char so we're b= ack to a 3 letter rafsi", I then try to grab whatever comes after &quo= t;o" and get a null-pointer or some such.

It just occurred to me that I might deal with 4 letter = rafsi by keeping in mind that they always end with "y". =A0So my = revised "grab leftmost rafsi" code would look something like:

word =3D xajmymro
if (word =3D "....y&qu= ot;) // where this is "word" =3D any 4 characters followed by an = "y"
=A0=A0return substring(word, 0, 4)

Then in the calling function I just have to look for gismu of the form rafs= i+a, rafsi+e, etc... till I find one that matches a gismu.

I'm still stuck on the buffer consonant problem though.

It feels wrong to use guesswork like "if you see [= r|l|m|n]C then check to see if it's a valid rafsi, if it's not, str= ip off the [r|l|m|n], grab another char from the right, and look THAT up an= d see if it's a rafsi".

Here's a non-code way to think of the problem. =A0H= ow would a parser figure out whether "co'amrobratroci" is &qu= ot;co'a mro bra troci" or "co'a m rob rat ro ci"?

On Fri, Oct 29, 2010 at 1:03 PM, .alyn.= post. <alyn.post@lodockikumazvati.org> wrote:
On Fri, Oct 29, 2010 at 12:08:09PM -0400,= Luke Bergen wrote:
> =A0 =A0When I first started learning lojban I wrote up a quick'n d= irty script to
> =A0 =A0make looking up words faster and easier. gismu and cmavo were e= asy, but I
> =A0 =A0could never figure out lujvo. So I'm taking another stab at= it. I
> =A0 =A0currently have something that works in the general cases of {ba= jdri},
> =A0 =A0{ba'udri}, and {bagypau}. But currently I'm not sure ho= w to deal with 4
> =A0 =A0letter rafsi and non "y" buffer letters.
> =A0 =A0To deal with the non "y" buffer letters I thought I c= ould just say:
> =A0 =A0strip all "y" from the word
> =A0 =A0get first three non "'" chars
> =A0 =A0if the first letter is "r", "l", "m&qu= ot;, or "n" and the second letter is a
> =A0 =A0consonant, then chop off the first letter and grab another lett= er from the
> =A0 =A0right
> =A0 =A0(so if I was parsing "bacru zei bevri" =3D "ba&#= 39;urbei" I would (after
> =A0 =A0handling ba'u in the first iteration) end up with "rbe= " and due to the
> =A0 =A0above step, I'd strip off the "r" and grab the ne= xt letter thus ending
> =A0 =A0with "bei" which is the right result).
> =A0 =A0But this produces strange results because there ARE cases where= buffer
> =A0 =A0letters are followed by consonants (morsi for instance).
> =A0 =A0Is there a way to un-ambiguously and algorithmically break a lu= jvo down
> =A0 =A0into its component gismu?
>

I haven't rigorously looked at this, so please excuse me if= I'm way
off base.

What if you start at the left side of the word and match characters
until you get a matching rafsi, then look for optional buffer
characters before matching your next rafsi, &c? =A0You could be much more sophisticated by adding detection for valid lerfu clustering
to throw out what would otherwise be an ambiguous case.

It sounds like you're working top down on the problem rather than
going from left to right, but I don't know what is wrong with my
suggestion yet.

I see you've provided 3 simple examples, but can you provide an
example for morsi which you mention at the end?

-Alan
--
.i ko djuno fi le do sevzi

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojba= n?hl=3Den.


--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--000325550e5a88b6630493c4e025--