From lojban+bncCLr6ktCfBBDxj6zmBBoEWe_Nnw@googlegroups.com Fri Oct 29 10:44:34 2010
Received: from mail-yx0-f189.google.com ([209.85.213.189])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCLr6ktCfBBDxj6zmBBoEWe_Nnw@googlegroups.com>)
	id 1PBt09-0002AF-0b; Fri, 29 Oct 2010 10:44:34 -0700
Received: by yxe42 with SMTP id 42sf4772781yxe.16
        for <multiple recipients>; Fri, 29 Oct 2010 10:44:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:received:received
         :received:received-spf:received:received:received:date:from:to
         :subject:message-id:mail-followup-to:references:mime-version
         :in-reply-to:x-original-sender:x-original-authentication-results
         :reply-to:precedence:mailing-list:list-id:list-post:list-help
         :list-archive:sender:list-subscribe:list-unsubscribe:content-type
         :content-disposition;
        bh=7Y6nB3WD5AVLw4ZlpDaBSKs/0B1pcJv6GScRdyptYQM=;
        b=O/4J2nyU7jr/NpBN/79W5FNoeFmHxWyg9YAuRq+6Kd2itzDkzSsQK3MYU6Hh+P37o/
         V7JWacnIJHVQn9SkOl7kW156v28SfUko4JKvVkyjFqEZTQaBu2o+6mhFCwmGQQHjVMxk
         5NTNs4WCfq/EAtNZSpRT0noQLTuSLqhfAs/hc=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:received-spf:date:from:to:subject:message-id
         :mail-followup-to:references:mime-version:in-reply-to
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe:content-type
         :content-disposition;
        b=o4PhDaKL+KEQRYtSVM0gMo04AxA876vmOeIh/SeRGk3SJTtPvWCRqLCY2eyeLl4ICN
         pdRe1oBNrgonYX6x2Kd3dDzYAG75vtoXyT5SkCofCnoQjdnI1ua8WlTRzets2SrSNhUo
         KoudBqoRbMHUXRHS9ScEIzTkbpg3GYd9H3BJc=
Received: by 10.150.172.6 with SMTP id u6mr1882482ybe.77.1288374257445;
        Fri, 29 Oct 2010 10:44:17 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.100.54.26 with SMTP id c26ls1040745ana.2.p; Fri, 29 Oct 2010
 10:44:16 -0700 (PDT)
Received: by 10.100.122.2 with SMTP id u2mr4581611anc.11.1288374256709;
        Fri, 29 Oct 2010 10:44:16 -0700 (PDT)
Received: by 10.100.122.2 with SMTP id u2mr4581610anc.11.1288374256696;
        Fri, 29 Oct 2010 10:44:16 -0700 (PDT)
Received: from mail-yw0-f52.google.com (mail-yw0-f52.google.com [209.85.213.52])
        by gmr-mx.google.com with ESMTP id x38si881189anx.7.2010.10.29.10.44.16;
        Fri, 29 Oct 2010 10:44:16 -0700 (PDT)
Received-SPF: neutral (google.com: 209.85.213.52 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.213.52;
Received: by ywf7 with SMTP id 7so2310715ywf.11
        for <lojban@googlegroups.com>; Fri, 29 Oct 2010 10:44:16 -0700 (PDT)
Received: by 10.91.13.18 with SMTP id q18mr4719735agi.50.1288374255576;
        Fri, 29 Oct 2010 10:44:15 -0700 (PDT)
Received: from sunflowerriver.org (173-10-243-253-Albuquerque.hfc.comcastbusiness.net [173.10.243.253])
        by mx.google.com with ESMTPS id r25sm1883896yhc.0.2010.10.29.10.44.13
        (version=TLSv1/SSLv3 cipher=RC4-MD5);
        Fri, 29 Oct 2010 10:44:14 -0700 (PDT)
Date: Fri, 29 Oct 2010 11:44:11 -0600
From: ".alyn.post." <alyn.post@lodockikumazvati.org>
To: lojban@googlegroups.com
Subject: Re: [lojban] lujvo deconstruction
Message-ID: <20101029174411.GF47249@alice.local>
Mail-Followup-To: lojban@googlegroups.com
References: <AANLkTik2apwYUT40-wMWcd_Wjj4B4aERKNsHVq_MCf=P@mail.gmail.com> <20101029170344.GB47249@alice.local> <AANLkTimEdWEmcwzgGm6=Fq3tgguQ1K_0uff7MKb5aZLU@mail.gmail.com> <AANLkTim4OyJoDtdJz_gopRdJrtg-4oYgZ1MgMBp0MLD+@mail.gmail.com>
Mime-Version: 1.0
In-Reply-To: <AANLkTim4OyJoDtdJz_gopRdJrtg-4oYgZ1MgMBp0MLD+@mail.gmail.com>
X-Original-Sender: alyn.post@lodockikumazvati.org
X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com:
 209.85.213.52 is neither permitted nor denied by best guess record for domain
 of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline

I think your message here contains the kernel of the solution,
namely that you can't just chop off three letters and call that a
rafsi, but you must grab three (four) letters that form a valid
rafsi or it isn't one.

The PEG grammar for Lojban morphology:

http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm

Shows what makes a valid Lujvo, and the process is more subtle than
"grab three letters and pretend they're a rafsi."  But if you follow
the formal grammar, you'll get an abstract syntax tree that fully
delimits each piece of the lujvo.

-Alan

On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote:
>    Actually I guess that was a bad example at the end because a lujvo ending
>    with "rat" would definitely be wrong. But you get where I'm going with it.
> 
>    On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <[1]lukeabergen@gmail.com>
>    wrote:
> 
>      Sorry, yes, I was providing very rough pseudocode for my script. I do
>      look from left to right. But since rafsi are always 3 letters (minus any
>      ' characters and excluding 4 letter rafsi), I take them in chunks of 3.
>      an example with morsi would be "xamymro". My code would go like:
>      grab left most three chars, check for .y'ys and grab a fourth char if
>      there is a .y'y
>      look up the rafsi, chop off what you found to be the "leftmost" rafsi
>      and loop again with what you have left
>      Now we're looking at "ymro"
>      Strip off "y" and we're left with "mro". Now because I'm assuming that
>      "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see
>      "mro" and think "ok, the 'm' is a buffer vowel so grab another char so
>      we're back to a 3 letter rafsi", I then try to grab whatever comes after
>      "o" and get a null-pointer or some such.
>      It just occurred to me that I might deal with 4 letter rafsi by keeping
>      in mind that they always end with "y". So my revised "grab leftmost
>      rafsi" code would look something like:
>      word = xajmymro
>      if (word = "....y") // where this is "word" = any 4 characters followed
>      by an "y"
>      return substring(word, 0, 4)
>      Then in the calling function I just have to look for gismu of the form
>      rafsi+a, rafsi+e, etc... till I find one that matches a gismu.
>      I'm still stuck on the buffer consonant problem though.
>      It feels wrong to use guesswork like "if you see [r|l|m|n]C then check
>      to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab
>      another char from the right, and look THAT up and see if it's a rafsi".
>      Here's a non-code way to think of the problem. How would a parser figure
>      out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat
>      ro ci"?
>      On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post.
>      <[2]alyn.post@lodockikumazvati.org> wrote:
> 
>        On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
>        > When I first started learning lojban I wrote up a quick'n dirty
>        script to
>        > make looking up words faster and easier. gismu and cmavo were easy,
>        but I
>        > could never figure out lujvo. So I'm taking another stab at it. I
>        > currently have something that works in the general cases of
>        {bajdri},
>        > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal
>        with 4
>        > letter rafsi and non "y" buffer letters.
>        > To deal with the non "y" buffer letters I thought I could just say:
>        > strip all "y" from the word
>        > get first three non "'" chars
>        > if the first letter is "r", "l", "m", or "n" and the second letter
>        is a
>        > consonant, then chop off the first letter and grab another letter
>        from the
>        > right
>        > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after
>        > handling ba'u in the first iteration) end up with "rbe" and due to
>        the
>        > above step, I'd strip off the "r" and grab the next letter thus
>        ending
>        > with "bei" which is the right result).
>        > But this produces strange results because there ARE cases where
>        buffer
>        > letters are followed by consonants (morsi for instance).
>        > Is there a way to un-ambiguously and algorithmically break a lujvo
>        down
>        > into its component gismu?
>        >
> 
>        I haven't rigorously looked at this, so please excuse me if I'm way
>        off base.
> 
>        What if you start at the left side of the word and match characters
>        until you get a matching rafsi, then look for optional buffer
>        characters before matching your next rafsi, &c? You could be much
>        more sophisticated by adding detection for valid lerfu clustering
>        to throw out what would otherwise be an ambiguous case.
> 
>        It sounds like you're working top down on the problem rather than
>        going from left to right, but I don't know what is wrong with my
>        suggestion yet.
> 
>        I see you've provided 3 simple examples, but can you provide an
>        example for morsi which you mention at the end?
> 
>        -Alan
>        --
>        .i ko djuno fi le do sevzi
>        --
>        You received this message because you are subscribed to the Google
>        Groups "lojban" group.
>        To post to this group, send email to [3]lojban@googlegroups.com.
>        To unsubscribe from this group, send email to
>        [4]lojban+unsubscribe@googlegroups.com.
>        For more options, visit this group at
>        [5]http://groups.google.com/group/lojban?hl=en.
> 
>    --
>    You received this message because you are subscribed to the Google Groups
>    "lojban" group.
>    To post to this group, send email to lojban@googlegroups.com.
>    To unsubscribe from this group, send email to
>    lojban+unsubscribe@googlegroups.com.
>    For more options, visit this group at
>    http://groups.google.com/group/lojban?hl=en.
> 
> References
> 
>    Visible links
>    1. mailto:lukeabergen@gmail.com
>    2. mailto:alyn.post@lodockikumazvati.org
>    3. mailto:lojban@googlegroups.com
>    4. mailto:lojban%2Bunsubscribe@googlegroups.com
>    5. http://groups.google.com/group/lojban?hl=en

-- 
.i ko djuno fi le do sevzi

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.