From lojban+bncCOjSjrXVGBChlazmBBoE7npoXg@googlegroups.com Fri Oct 29 10:56:03 2010 Received: from mail-yw0-f61.google.com ([209.85.213.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBtBF-0002cz-Qv; Fri, 29 Oct 2010 10:56:02 -0700 Received: by ywk9 with SMTP id 9sf3639732ywk.16 for ; Fri, 29 Oct 2010 10:55:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:mime-version:received:received :in-reply-to:references:date:message-id:subject:from:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=qaIbppvNxbLIj53uGs245PgGgJRTNJUxG0SWpDacgZI=; b=LMKL4hYagp3ua06Ea3illafUd/WbJPhCFFf9ut7u3l0qTDrHmwNPnNepLnFUT0Oq78 qNEJ/JNcdBXJBa2kbXuFxPAqCm+jAXUB5zbKk0E5EXrolr6c8Tqm6oxMBwysGf1KGV/b FFxEoCR+tb2R7l0Zgj4WHgdLR36yPeIYiT8WM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; b=0BoOehActI/zNXJS9kleNC+H6oW3+63Quaw4kSE6i5i838ooR7eEDX2XOfQTeBWgdG 0t3BL1oLTeocpddZwxZZpNgxIz/InKomS2HWTBzQDGQ98s2/GNd2j63rCER8xOT0dMSO pxtYN+4KJ5F9yvS3iL84L/WlUmNlD7LIxXq/g= Received: by 10.91.45.6 with SMTP id x6mr364243agj.38.1288374945715; Fri, 29 Oct 2010 10:55:45 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.231.180.73 with SMTP id bt9ls1454686ibb.0.p; Fri, 29 Oct 2010 10:55:44 -0700 (PDT) Received: by 10.231.183.7 with SMTP id ce7mr3586307ibb.7.1288374943777; Fri, 29 Oct 2010 10:55:43 -0700 (PDT) Received: by 10.231.183.7 with SMTP id ce7mr3586306ibb.7.1288374943740; Fri, 29 Oct 2010 10:55:43 -0700 (PDT) Received: from mail-iw0-f174.google.com (mail-iw0-f174.google.com [209.85.214.174]) by gmr-mx.google.com with ESMTP id bm7si3470212ibb.6.2010.10.29.10.55.42; Fri, 29 Oct 2010 10:55:42 -0700 (PDT) Received-SPF: pass (google.com: domain of lukeabergen@gmail.com designates 209.85.214.174 as permitted sender) client-ip=209.85.214.174; Received: by mail-iw0-f174.google.com with SMTP id 10so4018494iwn.19 for ; Fri, 29 Oct 2010 10:55:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.33.203 with SMTP id i11mr10966501ibd.8.1288374942399; Fri, 29 Oct 2010 10:55:42 -0700 (PDT) Received: by 10.231.149.14 with HTTP; Fri, 29 Oct 2010 10:55:42 -0700 (PDT) In-Reply-To: <20101029174411.GF47249@alice.local> References: <20101029170344.GB47249@alice.local> <20101029174411.GF47249@alice.local> Date: Fri, 29 Oct 2010 13:55:42 -0400 Message-ID: Subject: Re: [lojban] lujvo deconstruction From: Luke Bergen To: lojban@googlegroups.com X-Original-Sender: lukeabergen@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of lukeabergen@gmail.com designates 209.85.214.174 as permitted sender) smtp.mail=lukeabergen@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=002215048febc77e110493c52a8e --002215048febc77e110493c52a8e Content-Type: text/plain; charset=ISO-8859-1 well shite. I was hoping to get away with a shortcut that wouldn't require me to learn and implement a piece of the peg grammar. I don't even know PEG. I guess I have a good reason to now. On Fri, Oct 29, 2010 at 1:44 PM, .alyn.post. wrote: > I think your message here contains the kernel of the solution, > namely that you can't just chop off three letters and call that a > rafsi, but you must grab three (four) letters that form a valid > rafsi or it isn't one. > > The PEG grammar for Lojban morphology: > > > http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm > > Shows what makes a valid Lujvo, and the process is more subtle than > "grab three letters and pretend they're a rafsi." But if you follow > the formal grammar, you'll get an abstract syntax tree that fully > delimits each piece of the lujvo. > > -Alan > > On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote: > > Actually I guess that was a bad example at the end because a lujvo > ending > > with "rat" would definitely be wrong. But you get where I'm going with > it. > > > > On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <[1] > lukeabergen@gmail.com> > > wrote: > > > > Sorry, yes, I was providing very rough pseudocode for my script. I > do > > look from left to right. But since rafsi are always 3 letters (minus > any > > ' characters and excluding 4 letter rafsi), I take them in chunks of > 3. > > an example with morsi would be "xamymro". My code would go like: > > grab left most three chars, check for .y'ys and grab a fourth char > if > > there is a .y'y > > look up the rafsi, chop off what you found to be the "leftmost" > rafsi > > and loop again with what you have left > > Now we're looking at "ymro" > > Strip off "y" and we're left with "mro". Now because I'm assuming > that > > "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I > see > > "mro" and think "ok, the 'm' is a buffer vowel so grab another char > so > > we're back to a 3 letter rafsi", I then try to grab whatever comes > after > > "o" and get a null-pointer or some such. > > It just occurred to me that I might deal with 4 letter rafsi by > keeping > > in mind that they always end with "y". So my revised "grab leftmost > > rafsi" code would look something like: > > word = xajmymro > > if (word = "....y") // where this is "word" = any 4 characters > followed > > by an "y" > > return substring(word, 0, 4) > > Then in the calling function I just have to look for gismu of the > form > > rafsi+a, rafsi+e, etc... till I find one that matches a gismu. > > I'm still stuck on the buffer consonant problem though. > > It feels wrong to use guesswork like "if you see [r|l|m|n]C then > check > > to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], > grab > > another char from the right, and look THAT up and see if it's a > rafsi". > > Here's a non-code way to think of the problem. How would a parser > figure > > out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob > rat > > ro ci"? > > On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. > > <[2]alyn.post@lodockikumazvati.org> wrote: > > > > On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: > > > When I first started learning lojban I wrote up a quick'n dirty > > script to > > > make looking up words faster and easier. gismu and cmavo were > easy, > > but I > > > could never figure out lujvo. So I'm taking another stab at it. > I > > > currently have something that works in the general cases of > > {bajdri}, > > > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal > > with 4 > > > letter rafsi and non "y" buffer letters. > > > To deal with the non "y" buffer letters I thought I could just > say: > > > strip all "y" from the word > > > get first three non "'" chars > > > if the first letter is "r", "l", "m", or "n" and the second > letter > > is a > > > consonant, then chop off the first letter and grab another > letter > > from the > > > right > > > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would > (after > > > handling ba'u in the first iteration) end up with "rbe" and due > to > > the > > > above step, I'd strip off the "r" and grab the next letter thus > > ending > > > with "bei" which is the right result). > > > But this produces strange results because there ARE cases where > > buffer > > > letters are followed by consonants (morsi for instance). > > > Is there a way to un-ambiguously and algorithmically break a > lujvo > > down > > > into its component gismu? > > > > > > > I haven't rigorously looked at this, so please excuse me if I'm > way > > off base. > > > > What if you start at the left side of the word and match > characters > > until you get a matching rafsi, then look for optional buffer > > characters before matching your next rafsi, &c? You could be much > > more sophisticated by adding detection for valid lerfu clustering > > to throw out what would otherwise be an ambiguous case. > > > > It sounds like you're working top down on the problem rather than > > going from left to right, but I don't know what is wrong with my > > suggestion yet. > > > > I see you've provided 3 simple examples, but can you provide an > > example for morsi which you mention at the end? > > > > -Alan > > -- > > .i ko djuno fi le do sevzi > > -- > > You received this message because you are subscribed to the Google > > Groups "lojban" group. > > To post to this group, send email to [3]lojban@googlegroups.com. > > To unsubscribe from this group, send email to > > [4]lojban+unsubscribe@googlegroups.com > . > > For more options, visit this group at > > [5]http://groups.google.com/group/lojban?hl=en. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "lojban" group. > > To post to this group, send email to lojban@googlegroups.com. > > To unsubscribe from this group, send email to > > lojban+unsubscribe@googlegroups.com > . > > For more options, visit this group at > > http://groups.google.com/group/lojban?hl=en. > > > > References > > > > Visible links > > 1. mailto:lukeabergen@gmail.com > > 2. mailto:alyn.post@lodockikumazvati.org > > 3. mailto:lojban@googlegroups.com > > 4. mailto:lojban%2Bunsubscribe@googlegroups.com > > 5. http://groups.google.com/group/lojban?hl=en > > -- > .i ko djuno fi le do sevzi > > -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To post to this group, send email to lojban@googlegroups.com. > To unsubscribe from this group, send email to > lojban+unsubscribe@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/lojban?hl=en. > > -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --002215048febc77e110493c52a8e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable well shite. =A0I was hoping to get away with a shortcut that wouldn't r= equire me to learn and implement a piece of the peg grammar. =A0I don't= even know PEG. =A0I guess I have a good reason to now.

On Fri, Oct 29, 2010 at 1:44 PM, .alyn.post. <alyn.post@lodockikumazvati.org> wrote:
I think your message here contains the kernel of the solution,
namely that you can't just chop off three letters and call that a
rafsi, but you must grab three (four) letters that form a valid
rafsi or it isn't one.

The PEG grammar for Lojban morphology:

http://www.lojban.org/tiki/tik= i-index.php?page=3DBPFK+Section%3A+PEG+Morphology+Algorithm

Shows what makes a valid Lujvo, and the process is more subtle than
"grab three letters and pretend they're a rafsi." =A0But if y= ou follow
the formal grammar, you'll get an abstract syntax tree that fully
delimits each piece of the lujvo.

-Alan

On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote:
> =A0 =A0Actually I guess that was a bad example at the end because a lu= jvo ending
> =A0 =A0with "rat" would definitely be wrong. But you get whe= re I'm going with it.
>
> =A0 =A0On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <[1]lukeabergen@gmail.com>
> =A0 =A0wrote:
>
> =A0 =A0 =A0Sorry, yes, I was providing very rough pseudocode for my sc= ript. I do
> =A0 =A0 =A0look from left to right. But since rafsi are always 3 lette= rs (minus any
> =A0 =A0 =A0' characters and excluding 4 letter rafsi), I take them= in chunks of 3.
> =A0 =A0 =A0an example with morsi would be "xamymro". My code= would go like:
> =A0 =A0 =A0grab left most three chars, check for .y'ys and grab a = fourth char if
> =A0 =A0 =A0there is a .y'y
> =A0 =A0 =A0look up the rafsi, chop off what you found to be the "= leftmost" rafsi
> =A0 =A0 =A0and loop again with what you have left
> =A0 =A0 =A0Now we're looking at "ymro"
> =A0 =A0 =A0Strip off "y" and we're left with "mro&q= uot;. Now because I'm assuming that
> =A0 =A0 =A0"r", "l", "m", or "n&quo= t; followed by a consonant is a buffer vowel, I see
> =A0 =A0 =A0"mro" and think "ok, the 'm' is a bu= ffer vowel so grab another char so
> =A0 =A0 =A0we're back to a 3 letter rafsi", I then try to gra= b whatever comes after
> =A0 =A0 =A0"o" and get a null-pointer or some such.
> =A0 =A0 =A0It just occurred to me that I might deal with 4 letter rafs= i by keeping
> =A0 =A0 =A0in mind that they always end with "y". So my revi= sed "grab leftmost
> =A0 =A0 =A0rafsi" code would look something like:
> =A0 =A0 =A0word =3D xajmymro
> =A0 =A0 =A0if (word =3D "....y") // where this is "word= " =3D any 4 characters followed
> =A0 =A0 =A0by an "y"
> =A0 =A0 =A0return substring(word, 0, 4)
> =A0 =A0 =A0Then in the calling function I just have to look for gismu = of the form
> =A0 =A0 =A0rafsi+a, rafsi+e, etc... till I find one that matches a gis= mu.
> =A0 =A0 =A0I'm still stuck on the buffer consonant problem though.=
> =A0 =A0 =A0It feels wrong to use guesswork like "if you see [r|l|= m|n]C then check
> =A0 =A0 =A0to see if it's a valid rafsi, if it's not, strip of= f the [r|l|m|n], grab
> =A0 =A0 =A0another char from the right, and look THAT up and see if it= 's a rafsi".
> =A0 =A0 =A0Here's a non-code way to think of the problem. How woul= d a parser figure
> =A0 =A0 =A0out whether "co'amrobratroci" is "co'= ;a mro bra troci" or "co'a m rob rat
> =A0 =A0 =A0ro ci"?
> =A0 =A0 =A0On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post.
> =A0 =A0 =A0<[2]alyn.post@lodockikumazvati.org<= /a>> wrote:
>
> =A0 =A0 =A0 =A0On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen w= rote:
> =A0 =A0 =A0 =A0> When I first started learning lojban I wrote up a = quick'n dirty
> =A0 =A0 =A0 =A0script to
> =A0 =A0 =A0 =A0> make looking up words faster and easier. gismu and= cmavo were easy,
> =A0 =A0 =A0 =A0but I
> =A0 =A0 =A0 =A0> could never figure out lujvo. So I'm taking an= other stab at it. I
> =A0 =A0 =A0 =A0> currently have something that works in the general= cases of
> =A0 =A0 =A0 =A0{bajdri},
> =A0 =A0 =A0 =A0> {ba'udri}, and {bagypau}. But currently I'= m not sure how to deal
> =A0 =A0 =A0 =A0with 4
> =A0 =A0 =A0 =A0> letter rafsi and non "y" buffer letters.=
> =A0 =A0 =A0 =A0> To deal with the non "y" buffer letters = I thought I could just say:
> =A0 =A0 =A0 =A0> strip all "y" from the word
> =A0 =A0 =A0 =A0> get first three non "'" chars
> =A0 =A0 =A0 =A0> if the first letter is "r", "l"= ;, "m", or "n" and the second letter
> =A0 =A0 =A0 =A0is a
> =A0 =A0 =A0 =A0> consonant, then chop off the first letter and grab= another letter
> =A0 =A0 =A0 =A0from the
> =A0 =A0 =A0 =A0> right
> =A0 =A0 =A0 =A0> (so if I was parsing "bacru zei bevri" = =3D "ba'urbei" I would (after
> =A0 =A0 =A0 =A0> handling ba'u in the first iteration) end up w= ith "rbe" and due to
> =A0 =A0 =A0 =A0the
> =A0 =A0 =A0 =A0> above step, I'd strip off the "r" an= d grab the next letter thus
> =A0 =A0 =A0 =A0ending
> =A0 =A0 =A0 =A0> with "bei" which is the right result). > =A0 =A0 =A0 =A0> But this produces strange results because there AR= E cases where
> =A0 =A0 =A0 =A0buffer
> =A0 =A0 =A0 =A0> letters are followed by consonants (morsi for inst= ance).
> =A0 =A0 =A0 =A0> Is there a way to un-ambiguously and algorithmical= ly break a lujvo
> =A0 =A0 =A0 =A0down
> =A0 =A0 =A0 =A0> into its component gismu?
> =A0 =A0 =A0 =A0>
>
> =A0 =A0 =A0 =A0I haven't rigorously looked at this, so please excu= se me if I'm way
> =A0 =A0 =A0 =A0off base.
>
> =A0 =A0 =A0 =A0What if you start at the left side of the word and matc= h characters
> =A0 =A0 =A0 =A0until you get a matching rafsi, then look for optional = buffer
> =A0 =A0 =A0 =A0characters before matching your next rafsi, &c? You= could be much
> =A0 =A0 =A0 =A0more sophisticated by adding detection for valid lerfu = clustering
> =A0 =A0 =A0 =A0to throw out what would otherwise be an ambiguous case.=
>
> =A0 =A0 =A0 =A0It sounds like you're working top down on the probl= em rather than
> =A0 =A0 =A0 =A0going from left to right, but I don't know what is = wrong with my
> =A0 =A0 =A0 =A0suggestion yet.
>
> =A0 =A0 =A0 =A0I see you've provided 3 simple examples, but can yo= u provide an
> =A0 =A0 =A0 =A0example for morsi which you mention at the end?
>
> =A0 =A0 =A0 =A0-Alan
> =A0 =A0 =A0 =A0--
> =A0 =A0 =A0 =A0.i ko djuno fi le do sevzi
> =A0 =A0 =A0 =A0--
> =A0 =A0 =A0 =A0You received this message because you are subscribed to= the Google
> =A0 =A0 =A0 =A0Groups "lojban" group.
> =A0 =A0 =A0 =A0To post to this group, send email to [3]lojban@googlegroups.com.
> =A0 =A0 =A0 =A0To unsubscribe from this group, send = email to
> =A0 =A0 =A0 =A0[4]lojban+unsubscribe@googlegroups.com.
> =A0 =A0 =A0 =A0For more options, visit this group at=
> =A0 =A0 =A0 =A0[5]http://groups.google.com/group/lojban?hl=3Den= .
>
> =A0 =A0--
> =A0 =A0You received this message because you are subscribed to the Goo= gle Groups
> =A0 =A0"lojban" group.
> =A0 =A0To post to this group, send email to lojban@googlegroups.com.
> =A0 =A0To unsubscribe from this group, send email to
> =A0 =A0lojban= +unsubscribe@googlegroups.com.
> =A0 =A0For more options, visit this group at
> =A0 =A0http://groups.google.com/group/lojban?hl=3Den.
>
> References
>
> =A0 =A0Visible links
> =A0 =A01. mailto:lukeabergen@= gmail.com
> =A0 =A02. mailto:aly= n.post@lodockikumazvati.org
> =A0 =A03. mailto:lojban@goo= glegroups.com
> =A0 =A04. mailto:lojban%2Bunsubscribe@googlegroups.com
> =A0 =A05. http://groups.google.com/group/lojban?hl=3Den

--
.i ko djuno fi le do sevzi

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojba= n?hl=3Den.


--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--002215048febc77e110493c52a8e--