From lojban+bncCOjSjrXVGBCJo6zmBBoELbl4jg@googlegroups.com Fri Oct 29 11:25:33 2010
Received: from mail-gx0-f189.google.com ([209.85.161.189])
	by chain.digitalkingdom.org with esmtp (Exim 4.72)
	(envelope-from <lojban+bncCOjSjrXVGBCJo6zmBBoELbl4jg@googlegroups.com>)
	id 1PBtdj-0005uZ-Jr; Fri, 29 Oct 2010 11:25:32 -0700
Received: by gxk28 with SMTP id 28sf4828774gxk.16
        for <multiple recipients>; Fri, 29 Oct 2010 11:25:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=beta;
        h=domainkey-signature:received:x-beenthere:received:received:received
         :received:received-spf:received:mime-version:received:received
         :in-reply-to:references:date:message-id:subject:from:to
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe:content-type;
        bh=UKkIkeMcgGePgdZ8OwozP47k0sTxWKdSfZCsKcfhQgM=;
        b=jwrjnAjJD+GcAxYeFBhlOrzqOk3aclQgWz1kC7GzLa23U2hvN2mnuqoCPclooi2of6
         uHvfEeAqiRpxn6k4YVlQ1viruZ/qtqGO1Pfx2TRRof5EPl2JB75KAUw1KWeDSbmYcN9m
         aVIFF0JhLbTqISlO8wEOfyY9TckppvaqIleJo=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlegroups.com; s=beta;
        h=x-beenthere:received-spf:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:x-original-sender
         :x-original-authentication-results:reply-to:precedence:mailing-list
         :list-id:list-post:list-help:list-archive:sender:list-subscribe
         :list-unsubscribe:content-type;
        b=IpZqYdqsNpKbOD+rDSFKasK/7G1ObCkVqFVU8TgSppIZ98uT5tngnn0FBPjZoeXakg
         rdXPbRv5VtIiZRMGH5omp2equm4FJ65ROoY5b7zpBc2doi93V44ufaN3+LsgqAmoxhmF
         OWbFfHQh+bm80XTVb1y9tn2azcirDS+TEnNxk=
Received: by 10.90.237.3 with SMTP id k3mr370061agh.12.1288376713251;
        Fri, 29 Oct 2010 11:25:13 -0700 (PDT)
X-BeenThere: lojban@googlegroups.com
Received: by 10.231.123.203 with SMTP id q11ls3131688ibr.2.p; Fri, 29 Oct 2010
 11:25:12 -0700 (PDT)
Received: by 10.231.161.81 with SMTP id q17mr3632354ibx.12.1288376712617;
        Fri, 29 Oct 2010 11:25:12 -0700 (PDT)
Received: by 10.231.161.81 with SMTP id q17mr3632353ibx.12.1288376712544;
        Fri, 29 Oct 2010 11:25:12 -0700 (PDT)
Received: from mail-iw0-f173.google.com (mail-iw0-f173.google.com [209.85.214.173])
        by gmr-mx.google.com with ESMTP id j25si3493201ibb.4.2010.10.29.11.25.11;
        Fri, 29 Oct 2010 11:25:11 -0700 (PDT)
Received-SPF: pass (google.com: domain of lukeabergen@gmail.com designates 209.85.214.173 as permitted sender) client-ip=209.85.214.173;
Received: by mail-iw0-f173.google.com with SMTP id 36so4225650iwn.4
        for <lojban@googlegroups.com>; Fri, 29 Oct 2010 11:25:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.35.138 with SMTP id p10mr11582994ibd.33.1288376711027;
 Fri, 29 Oct 2010 11:25:11 -0700 (PDT)
Received: by 10.231.149.14 with HTTP; Fri, 29 Oct 2010 11:25:10 -0700 (PDT)
In-Reply-To: <20101029181312.GG47249@alice.local>
References: <AANLkTik2apwYUT40-wMWcd_Wjj4B4aERKNsHVq_MCf=P@mail.gmail.com>
	<20101029170344.GB47249@alice.local>
	<AANLkTimEdWEmcwzgGm6=Fq3tgguQ1K_0uff7MKb5aZLU@mail.gmail.com>
	<AANLkTim4OyJoDtdJz_gopRdJrtg-4oYgZ1MgMBp0MLD+@mail.gmail.com>
	<20101029174411.GF47249@alice.local>
	<AANLkTik9RwOJmbp=7A1Kye85NigKMwzp0EBRsyMiQ+6E@mail.gmail.com>
	<20101029181312.GG47249@alice.local>
Date: Fri, 29 Oct 2010 14:25:10 -0400
Message-ID: <AANLkTi=3fwUwtNSLUJarBPiWgKNv1D3Ej31VS2s6Y2a4@mail.gmail.com>
Subject: Re: [lojban] lujvo deconstruction
From: Luke Bergen <lukeabergen@gmail.com>
To: lojban@googlegroups.com
X-Original-Sender: lukeabergen@gmail.com
X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com:
 domain of lukeabergen@gmail.com designates 209.85.214.173 as permitted
 sender) smtp.mail=lukeabergen@gmail.com; dkim=pass (test mode) header.i=@gmail.com
Reply-To: lojban@googlegroups.com
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
List-ID: <lojban.googlegroups.com>
List-Post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lojban?hl=en_US>
Sender: lojban@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Content-Type: multipart/alternative; boundary=0022152d6e5932b9fe0493c594a9

--0022152d6e5932b9fe0493c594a9
Content-Type: text/plain; charset=ISO-8859-1

.oi I already have a bunch of code in place to handle gismu and cmavo.  I
was trying to just hook lujvo in.  Looks like it's going to be a lot harder
for lujvo.  For gismu and cmavo I just grab the word and look for it in a
text file that has all the gismu and cmavo in it (and where the word occurs
within the first few characters of the line so as not to get false positives
where the word shows up in the definition or some such).

I've been using autoit for this for a while and it's suited me well, it just
gets irritating when I want to lookup a lujvo.  I chose autoit because it's
easy, has high utility, and makes creating global shortcut keys very easy.
 Once I get the lujvo bit nailed down maybe I'll release it.

On Fri, Oct 29, 2010 at 2:13 PM, .alyn.post. <alyn.post@lodockikumazvati.org
> wrote:

> What language are you going to use/are you using?
>
> I've been teaching myself about PEG grammar and packrat parsing, and
> my experience so far has been quite positive.  Brian Ford's Master's
> thesis is quite easy to read, and as a technique packrat parsing is
> probably easier to understand than any other parsing technique.
>
> There are a *lot* of parsing problems in the world that have
> half-baked solutions, with someone trying to work around having to
> understanding parsing.
>
> By way of an example, have a look at how syntax hightlighting works
> in vim:
>
>  http://vim.wikia.com/wiki/Creating_your_own_syntax_files
>
> That is a stunning amount of work put into doing something the wrong
> way, all presumably to avoid having to actually learning about
> parsing.  I can't imagine writing a syntax file for vim that would Do
> The Right Thing(tm) with Lojban.  You'd be fighting the code trying
> to make it act like a parser.
>
> You won't regret using Lojban's formal grammar for your project.  :-)
>
> -Alan
>
> On Fri, Oct 29, 2010 at 01:55:42PM -0400, Luke Bergen wrote:
> >    well shite. I was hoping to get away with a shortcut that wouldn't
> require
> >    me to learn and implement a piece of the peg grammar. I don't even
> know
> >    PEG. I guess I have a good reason to now.
> >
> >    On Fri, Oct 29, 2010 at 1:44 PM, .alyn.post.
> >    <[1]alyn.post@lodockikumazvati.org> wrote:
> >
> >      I think your message here contains the kernel of the solution,
> >      namely that you can't just chop off three letters and call that a
> >      rafsi, but you must grab three (four) letters that form a valid
> >      rafsi or it isn't one.
> >
> >      The PEG grammar for Lojban morphology:
> >
> >      [2]
> http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm
> >
> >      Shows what makes a valid Lujvo, and the process is more subtle than
> >      "grab three letters and pretend they're a rafsi." But if you follow
> >      the formal grammar, you'll get an abstract syntax tree that fully
> >      delimits each piece of the lujvo.
> >
> >      -Alan
> >      On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote:
> >      > Actually I guess that was a bad example at the end because a lujvo
> >      ending
> >      > with "rat" would definitely be wrong. But you get where I'm going
> with
> >      it.
> >      >
> >      > On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen
> >      <[1][3]lukeabergen@gmail.com>
> >      > wrote:
> >      >
> >      > Sorry, yes, I was providing very rough pseudocode for my script. I
> do
> >      > look from left to right. But since rafsi are always 3 letters
> (minus
> >      any
> >      > ' characters and excluding 4 letter rafsi), I take them in chunks
> of
> >      3.
> >      > an example with morsi would be "xamymro". My code would go like:
> >      > grab left most three chars, check for .y'ys and grab a fourth char
> if
> >      > there is a .y'y
> >      > look up the rafsi, chop off what you found to be the "leftmost"
> rafsi
> >      > and loop again with what you have left
> >      > Now we're looking at "ymro"
> >      > Strip off "y" and we're left with "mro". Now because I'm assuming
> that
> >      > "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I
> see
> >      > "mro" and think "ok, the 'm' is a buffer vowel so grab another
> char so
> >      > we're back to a 3 letter rafsi", I then try to grab whatever comes
> >      after
> >      > "o" and get a null-pointer or some such.
> >      > It just occurred to me that I might deal with 4 letter rafsi by
> >      keeping
> >      > in mind that they always end with "y". So my revised "grab
> leftmost
> >      > rafsi" code would look something like:
> >      > word = xajmymro
> >      > if (word = "....y") // where this is "word" = any 4 characters
> >      followed
> >      > by an "y"
> >      > return substring(word, 0, 4)
> >      > Then in the calling function I just have to look for gismu of the
> form
> >      > rafsi+a, rafsi+e, etc... till I find one that matches a gismu.
> >      > I'm still stuck on the buffer consonant problem though.
> >      > It feels wrong to use guesswork like "if you see [r|l|m|n]C then
> check
> >      > to see if it's a valid rafsi, if it's not, strip off the
> [r|l|m|n],
> >      grab
> >      > another char from the right, and look THAT up and see if it's a
> >      rafsi".
> >      > Here's a non-code way to think of the problem. How would a parser
> >      figure
> >      > out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m
> rob
> >      rat
> >      > ro ci"?
> >      > On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post.
> >      > <[2][4]alyn.post@lodockikumazvati.org> wrote:
> >      >
> >      > On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
> >      > > When I first started learning lojban I wrote up a quick'n dirty
> >      > script to
> >      > > make looking up words faster and easier. gismu and cmavo were
> easy,
> >      > but I
> >      > > could never figure out lujvo. So I'm taking another stab at it.
> I
> >      > > currently have something that works in the general cases of
> >      > {bajdri},
> >      > > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal
> >      > with 4
> >      > > letter rafsi and non "y" buffer letters.
> >      > > To deal with the non "y" buffer letters I thought I could just
> say:
> >      > > strip all "y" from the word
> >      > > get first three non "'" chars
> >      > > if the first letter is "r", "l", "m", or "n" and the second
> letter
> >      > is a
> >      > > consonant, then chop off the first letter and grab another
> letter
> >      > from the
> >      > > right
> >      > > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would
> (after
> >      > > handling ba'u in the first iteration) end up with "rbe" and due
> to
> >      > the
> >      > > above step, I'd strip off the "r" and grab the next letter thus
> >      > ending
> >      > > with "bei" which is the right result).
> >      > > But this produces strange results because there ARE cases where
> >      > buffer
> >      > > letters are followed by consonants (morsi for instance).
> >      > > Is there a way to un-ambiguously and algorithmically break a
> lujvo
> >      > down
> >      > > into its component gismu?
> >      > >
> >      >
> >      > I haven't rigorously looked at this, so please excuse me if I'm
> way
> >      > off base.
> >      >
> >      > What if you start at the left side of the word and match
> characters
> >      > until you get a matching rafsi, then look for optional buffer
> >      > characters before matching your next rafsi, &c? You could be much
> >      > more sophisticated by adding detection for valid lerfu clustering
> >      > to throw out what would otherwise be an ambiguous case.
> >      >
> >      > It sounds like you're working top down on the problem rather than
> >      > going from left to right, but I don't know what is wrong with my
> >      > suggestion yet.
> >      >
> >      > I see you've provided 3 simple examples, but can you provide an
> >      > example for morsi which you mention at the end?
> >      >
> >      > -Alan
> >      > --
> >      > .i ko djuno fi le do sevzi
> >      > --
> >      > You received this message because you are subscribed to the Google
> >      > Groups "lojban" group.
> >      > To post to this group, send email to [3][5]
> lojban@googlegroups.com.
> >      > To unsubscribe from this group, send email to
> >      > [4][6]lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
> .
> >      > For more options, visit this group at
> >      > [5][7]http://groups.google.com/group/lojban?hl=en.
> >      >
> >      > --
> >      > You received this message because you are subscribed to the Google
> >      Groups
> >      > "lojban" group.
> >      > To post to this group, send email to [8]lojban@googlegroups.com.
> >      > To unsubscribe from this group, send email to
> >      > [9]lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
> .
> >      > For more options, visit this group at
> >      > [10]http://groups.google.com/group/lojban?hl=en.
> >      >
> >      > References
> >      >
> >      > Visible links
> >      > 1. mailto:[11]lukeabergen@gmail.com
> >      > 2. mailto:[12]alyn.post@lodockikumazvati.org
> >      > 3. mailto:[13]lojban@googlegroups.com
> >      > 4. mailto:[14]lojban%2Bunsubscribe@googlegroups.com<lojban%252Bunsubscribe@googlegroups.com>
> >      > 5. [15]http://groups.google.com/group/lojban?hl=en
> >      --
> >      .i ko djuno fi le do sevzi
> >
> >      --
> >      You received this message because you are subscribed to the Google
> >      Groups "lojban" group.
> >      To post to this group, send email to [16]lojban@googlegroups.com.
> >      To unsubscribe from this group, send email to
> >      [17]lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
> .
> >      For more options, visit this group at
> >      [18]http://groups.google.com/group/lojban?hl=en.
> >
> >    --
> >    You received this message because you are subscribed to the Google
> Groups
> >    "lojban" group.
> >    To post to this group, send email to lojban@googlegroups.com.
> >    To unsubscribe from this group, send email to
> >    lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
> .
> >    For more options, visit this group at
> >    http://groups.google.com/group/lojban?hl=en.
> >
> > References
> >
> >    Visible links
> >    1. mailto:alyn.post@lodockikumazvati.org
> >    2.
> http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm
> >    3. mailto:lukeabergen@gmail.com
> >    4. mailto:alyn.post@lodockikumazvati.org
> >    5. mailto:lojban@googlegroups.com
> >    6. mailto:lojban%2Bunsubscribe@googlegroups.com<lojban%252Bunsubscribe@googlegroups.com>
> >    7. http://groups.google.com/group/lojban?hl=en
> >    8. mailto:lojban@googlegroups.com
> >    9. mailto:lojban%2Bunsubscribe@googlegroups.com<lojban%252Bunsubscribe@googlegroups.com>
> >   10. http://groups.google.com/group/lojban?hl=en
> >   11. mailto:lukeabergen@gmail.com
> >   12. mailto:alyn.post@lodockikumazvati.org
> >   13. mailto:lojban@googlegroups.com
> >   14. mailto:lojban%252Bunsubscribe@googlegroups.com<lojban%25252Bunsubscribe@googlegroups.com>
> >   15. http://groups.google.com/group/lojban?hl=en
> >   16. mailto:lojban@googlegroups.com
> >   17. mailto:lojban%2Bunsubscribe@googlegroups.com<lojban%252Bunsubscribe@googlegroups.com>
> >   18. http://groups.google.com/group/lojban?hl=en
>
> --
> .i ko djuno fi le do sevzi
>
> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To post to this group, send email to lojban@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+unsubscribe@googlegroups.com<lojban%2Bunsubscribe@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.


--0022152d6e5932b9fe0493c594a9
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

.oi I already have a bunch of code in place to handle gismu and cmavo. =A0I=
 was trying to just hook lujvo in. =A0Looks like it&#39;s going to be a lot=
 harder for lujvo. =A0For gismu and cmavo I just grab the word and look for=
 it in a text file that has all the gismu and cmavo in it (and where the wo=
rd occurs within the first few characters of the line so as not to get fals=
e positives where the word shows up in the definition or some such).<div>
<br></div><div>I&#39;ve been using autoit for this for a while and it&#39;s=
 suited me well, it just gets irritating when I want to lookup a lujvo. =A0=
I chose autoit because it&#39;s easy, has high utility, and makes creating =
global shortcut keys very easy. =A0Once I get the lujvo bit nailed down may=
be I&#39;ll release it.<br>
<br><div class=3D"gmail_quote">On Fri, Oct 29, 2010 at 2:13 PM, .alyn.post.=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:alyn.post@lodockikumazvati.org">al=
yn.post@lodockikumazvati.org</a>&gt;</span> wrote:<br><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex;">
What language are you going to use/are you using?<br>
<br>
I&#39;ve been teaching myself about PEG grammar and packrat parsing, and<br=
>
my experience so far has been quite positive. =A0Brian Ford&#39;s Master=
9;s<br>
thesis is quite easy to read, and as a technique packrat parsing is<br>
probably easier to understand than any other parsing technique.<br>
<br>
There are a *lot* of parsing problems in the world that have<br>
half-baked solutions, with someone trying to work around having to<br>
understanding parsing.<br>
<br>
By way of an example, have a look at how syntax hightlighting works<br>
in vim:<br>
<br>
 =A0<a href=3D"http://vim.wikia.com/wiki/Creating_your_own_syntax_files" ta=
rget=3D"_blank">http://vim.wikia.com/wiki/Creating_your_own_syntax_files</a=
><br>
<br>
That is a stunning amount of work put into doing something the wrong<br>
way, all presumably to avoid having to actually learning about<br>
parsing. =A0I can&#39;t imagine writing a syntax file for vim that would Do=
<br>
The Right Thing(tm) with Lojban. =A0You&#39;d be fighting the code trying<b=
r>
to make it act like a parser.<br>
<br>
You won&#39;t regret using Lojban&#39;s formal grammar for your project. =
=A0:-)<br>
<br>
-Alan<br>
<div class=3D"im"><br>
On Fri, Oct 29, 2010 at 01:55:42PM -0400, Luke Bergen wrote:<br>
&gt; =A0 =A0well shite. I was hoping to get away with a shortcut that would=
n&#39;t require<br>
&gt; =A0 =A0me to learn and implement a piece of the peg grammar. I don&#39=
;t even know<br>
&gt; =A0 =A0PEG. I guess I have a good reason to now.<br>
&gt;<br>
&gt; =A0 =A0On Fri, Oct 29, 2010 at 1:44 PM, .alyn.post.<br>
</div><div class=3D"im">&gt; =A0 =A0&lt;[1]<a href=3D"mailto:alyn.post@lodo=
ckikumazvati.org">alyn.post@lodockikumazvati.org</a>&gt; wrote:<br>
&gt;<br>
&gt; =A0 =A0 =A0I think your message here contains the kernel of the soluti=
on,<br>
&gt; =A0 =A0 =A0namely that you can&#39;t just chop off three letters and c=
all that a<br>
&gt; =A0 =A0 =A0rafsi, but you must grab three (four) letters that form a v=
alid<br>
&gt; =A0 =A0 =A0rafsi or it isn&#39;t one.<br>
&gt;<br>
&gt; =A0 =A0 =A0The PEG grammar for Lojban morphology:<br>
&gt;<br>
</div>&gt; =A0 =A0 =A0[2]<a href=3D"http://www.lojban.org/tiki/tiki-index.p=
hp?page=3DBPFK+Section%3A+PEG+Morphology+Algorithm" target=3D"_blank">http:=
//www.lojban.org/tiki/tiki-index.php?page=3DBPFK+Section%3A+PEG+Morphology+=
Algorithm</a><br>

<div class=3D"im">&gt;<br>
&gt; =A0 =A0 =A0Shows what makes a valid Lujvo, and the process is more sub=
tle than<br>
&gt; =A0 =A0 =A0&quot;grab three letters and pretend they&#39;re a rafsi.&q=
uot; But if you follow<br>
&gt; =A0 =A0 =A0the formal grammar, you&#39;ll get an abstract syntax tree =
that fully<br>
&gt; =A0 =A0 =A0delimits each piece of the lujvo.<br>
&gt;<br>
&gt; =A0 =A0 =A0-Alan<br>
&gt; =A0 =A0 =A0On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote=
:<br>
&gt; =A0 =A0 =A0&gt; Actually I guess that was a bad example at the end bec=
ause a lujvo<br>
&gt; =A0 =A0 =A0ending<br>
&gt; =A0 =A0 =A0&gt; with &quot;rat&quot; would definitely be wrong. But yo=
u get where I&#39;m going with<br>
&gt; =A0 =A0 =A0it.<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen<br>
</div>&gt; =A0 =A0 =A0&lt;[1][3]<a href=3D"mailto:lukeabergen@gmail.com">lu=
keabergen@gmail.com</a>&gt;<br>
<div><div></div><div class=3D"h5">&gt; =A0 =A0 =A0&gt; wrote:<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; Sorry, yes, I was providing very rough pseudocode for =
my script. I do<br>
&gt; =A0 =A0 =A0&gt; look from left to right. But since rafsi are always 3 =
letters (minus<br>
&gt; =A0 =A0 =A0any<br>
&gt; =A0 =A0 =A0&gt; &#39; characters and excluding 4 letter rafsi), I take=
 them in chunks of<br>
&gt; =A0 =A0 =A03.<br>
&gt; =A0 =A0 =A0&gt; an example with morsi would be &quot;xamymro&quot;. My=
 code would go like:<br>
&gt; =A0 =A0 =A0&gt; grab left most three chars, check for .y&#39;ys and gr=
ab a fourth char if<br>
&gt; =A0 =A0 =A0&gt; there is a .y&#39;y<br>
&gt; =A0 =A0 =A0&gt; look up the rafsi, chop off what you found to be the &=
quot;leftmost&quot; rafsi<br>
&gt; =A0 =A0 =A0&gt; and loop again with what you have left<br>
&gt; =A0 =A0 =A0&gt; Now we&#39;re looking at &quot;ymro&quot;<br>
&gt; =A0 =A0 =A0&gt; Strip off &quot;y&quot; and we&#39;re left with &quot;=
mro&quot;. Now because I&#39;m assuming that<br>
&gt; =A0 =A0 =A0&gt; &quot;r&quot;, &quot;l&quot;, &quot;m&quot;, or &quot;=
n&quot; followed by a consonant is a buffer vowel, I see<br>
&gt; =A0 =A0 =A0&gt; &quot;mro&quot; and think &quot;ok, the &#39;m&#39; is=
 a buffer vowel so grab another char so<br>
&gt; =A0 =A0 =A0&gt; we&#39;re back to a 3 letter rafsi&quot;, I then try t=
o grab whatever comes<br>
&gt; =A0 =A0 =A0after<br>
&gt; =A0 =A0 =A0&gt; &quot;o&quot; and get a null-pointer or some such.<br>
&gt; =A0 =A0 =A0&gt; It just occurred to me that I might deal with 4 letter=
 rafsi by<br>
&gt; =A0 =A0 =A0keeping<br>
&gt; =A0 =A0 =A0&gt; in mind that they always end with &quot;y&quot;. So my=
 revised &quot;grab leftmost<br>
&gt; =A0 =A0 =A0&gt; rafsi&quot; code would look something like:<br>
&gt; =A0 =A0 =A0&gt; word =3D xajmymro<br>
&gt; =A0 =A0 =A0&gt; if (word =3D &quot;....y&quot;) // where this is &quot=
;word&quot; =3D any 4 characters<br>
&gt; =A0 =A0 =A0followed<br>
&gt; =A0 =A0 =A0&gt; by an &quot;y&quot;<br>
&gt; =A0 =A0 =A0&gt; return substring(word, 0, 4)<br>
&gt; =A0 =A0 =A0&gt; Then in the calling function I just have to look for g=
ismu of the form<br>
&gt; =A0 =A0 =A0&gt; rafsi+a, rafsi+e, etc... till I find one that matches =
a gismu.<br>
&gt; =A0 =A0 =A0&gt; I&#39;m still stuck on the buffer consonant problem th=
ough.<br>
&gt; =A0 =A0 =A0&gt; It feels wrong to use guesswork like &quot;if you see =
[r|l|m|n]C then check<br>
&gt; =A0 =A0 =A0&gt; to see if it&#39;s a valid rafsi, if it&#39;s not, str=
ip off the [r|l|m|n],<br>
&gt; =A0 =A0 =A0grab<br>
&gt; =A0 =A0 =A0&gt; another char from the right, and look THAT up and see =
if it&#39;s a<br>
&gt; =A0 =A0 =A0rafsi&quot;.<br>
&gt; =A0 =A0 =A0&gt; Here&#39;s a non-code way to think of the problem. How=
 would a parser<br>
&gt; =A0 =A0 =A0figure<br>
&gt; =A0 =A0 =A0&gt; out whether &quot;co&#39;amrobratroci&quot; is &quot;c=
o&#39;a mro bra troci&quot; or &quot;co&#39;a m rob<br>
&gt; =A0 =A0 =A0rat<br>
&gt; =A0 =A0 =A0&gt; ro ci&quot;?<br>
&gt; =A0 =A0 =A0&gt; On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post.<br>
</div></div><div><div></div><div class=3D"h5">&gt; =A0 =A0 =A0&gt; &lt;[2][=
4]<a href=3D"mailto:alyn.post@lodockikumazvati.org">alyn.post@lodockikumazv=
ati.org</a>&gt; wrote:<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen =
wrote:<br>
&gt; =A0 =A0 =A0&gt; &gt; When I first started learning lojban I wrote up a=
 quick&#39;n dirty<br>
&gt; =A0 =A0 =A0&gt; script to<br>
&gt; =A0 =A0 =A0&gt; &gt; make looking up words faster and easier. gismu an=
d cmavo were easy,<br>
&gt; =A0 =A0 =A0&gt; but I<br>
&gt; =A0 =A0 =A0&gt; &gt; could never figure out lujvo. So I&#39;m taking a=
nother stab at it. I<br>
&gt; =A0 =A0 =A0&gt; &gt; currently have something that works in the genera=
l cases of<br>
&gt; =A0 =A0 =A0&gt; {bajdri},<br>
&gt; =A0 =A0 =A0&gt; &gt; {ba&#39;udri}, and {bagypau}. But currently I&#39=
;m not sure how to deal<br>
&gt; =A0 =A0 =A0&gt; with 4<br>
&gt; =A0 =A0 =A0&gt; &gt; letter rafsi and non &quot;y&quot; buffer letters=
.<br>
&gt; =A0 =A0 =A0&gt; &gt; To deal with the non &quot;y&quot; buffer letters=
 I thought I could just say:<br>
&gt; =A0 =A0 =A0&gt; &gt; strip all &quot;y&quot; from the word<br>
&gt; =A0 =A0 =A0&gt; &gt; get first three non &quot;&#39;&quot; chars<br>
&gt; =A0 =A0 =A0&gt; &gt; if the first letter is &quot;r&quot;, &quot;l&quo=
t;, &quot;m&quot;, or &quot;n&quot; and the second letter<br>
&gt; =A0 =A0 =A0&gt; is a<br>
&gt; =A0 =A0 =A0&gt; &gt; consonant, then chop off the first letter and gra=
b another letter<br>
&gt; =A0 =A0 =A0&gt; from the<br>
&gt; =A0 =A0 =A0&gt; &gt; right<br>
&gt; =A0 =A0 =A0&gt; &gt; (so if I was parsing &quot;bacru zei bevri&quot; =
=3D &quot;ba&#39;urbei&quot; I would (after<br>
&gt; =A0 =A0 =A0&gt; &gt; handling ba&#39;u in the first iteration) end up =
with &quot;rbe&quot; and due to<br>
&gt; =A0 =A0 =A0&gt; the<br>
&gt; =A0 =A0 =A0&gt; &gt; above step, I&#39;d strip off the &quot;r&quot; a=
nd grab the next letter thus<br>
&gt; =A0 =A0 =A0&gt; ending<br>
&gt; =A0 =A0 =A0&gt; &gt; with &quot;bei&quot; which is the right result).<=
br>
&gt; =A0 =A0 =A0&gt; &gt; But this produces strange results because there A=
RE cases where<br>
&gt; =A0 =A0 =A0&gt; buffer<br>
&gt; =A0 =A0 =A0&gt; &gt; letters are followed by consonants (morsi for ins=
tance).<br>
&gt; =A0 =A0 =A0&gt; &gt; Is there a way to un-ambiguously and algorithmica=
lly break a lujvo<br>
&gt; =A0 =A0 =A0&gt; down<br>
&gt; =A0 =A0 =A0&gt; &gt; into its component gismu?<br>
&gt; =A0 =A0 =A0&gt; &gt;<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; I haven&#39;t rigorously looked at this, so please exc=
use me if I&#39;m way<br>
&gt; =A0 =A0 =A0&gt; off base.<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; What if you start at the left side of the word and mat=
ch characters<br>
&gt; =A0 =A0 =A0&gt; until you get a matching rafsi, then look for optional=
 buffer<br>
&gt; =A0 =A0 =A0&gt; characters before matching your next rafsi, &amp;c? Yo=
u could be much<br>
&gt; =A0 =A0 =A0&gt; more sophisticated by adding detection for valid lerfu=
 clustering<br>
&gt; =A0 =A0 =A0&gt; to throw out what would otherwise be an ambiguous case=
.<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; It sounds like you&#39;re working top down on the prob=
lem rather than<br>
&gt; =A0 =A0 =A0&gt; going from left to right, but I don&#39;t know what is=
 wrong with my<br>
&gt; =A0 =A0 =A0&gt; suggestion yet.<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; I see you&#39;ve provided 3 simple examples, but can y=
ou provide an<br>
&gt; =A0 =A0 =A0&gt; example for morsi which you mention at the end?<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; -Alan<br>
&gt; =A0 =A0 =A0&gt; --<br>
&gt; =A0 =A0 =A0&gt; .i ko djuno fi le do sevzi<br>
&gt; =A0 =A0 =A0&gt; --<br>
&gt; =A0 =A0 =A0&gt; You received this message because you are subscribed t=
o the Google<br>
&gt; =A0 =A0 =A0&gt; Groups &quot;lojban&quot; group.<br>
</div></div>&gt; =A0 =A0 =A0&gt; To post to this group, send email to [3][5=
]<a href=3D"mailto:lojban@googlegroups.com">lojban@googlegroups.com</a>.<br=
>
<div class=3D"im">&gt; =A0 =A0 =A0&gt; To unsubscribe from this group, send=
 email to<br>
</div>&gt; =A0 =A0 =A0&gt; [4][6]<a href=3D"mailto:lojban%2Bunsubscribe@goo=
glegroups.com">lojban+unsubscribe@googlegroups.com</a>.<br>
<div class=3D"im">&gt; =A0 =A0 =A0&gt; For more options, visit this group a=
t<br>
</div>&gt; =A0 =A0 =A0&gt; [5][7]<a href=3D"http://groups.google.com/group/=
lojban?hl=3Den" target=3D"_blank">http://groups.google.com/group/lojban?hl=
=3Den</a>.<br>
<div class=3D"im">&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; --<br>
&gt; =A0 =A0 =A0&gt; You received this message because you are subscribed t=
o the Google<br>
&gt; =A0 =A0 =A0Groups<br>
&gt; =A0 =A0 =A0&gt; &quot;lojban&quot; group.<br>
</div>&gt; =A0 =A0 =A0&gt; To post to this group, send email to [8]<a href=
=3D"mailto:lojban@googlegroups.com">lojban@googlegroups.com</a>.<br>
<div class=3D"im">&gt; =A0 =A0 =A0&gt; To unsubscribe from this group, send=
 email to<br>
</div>&gt; =A0 =A0 =A0&gt; [9]<a href=3D"mailto:lojban%2Bunsubscribe@google=
groups.com">lojban+unsubscribe@googlegroups.com</a>.<br>
<div class=3D"im">&gt; =A0 =A0 =A0&gt; For more options, visit this group a=
t<br>
</div>&gt; =A0 =A0 =A0&gt; [10]<a href=3D"http://groups.google.com/group/lo=
jban?hl=3Den" target=3D"_blank">http://groups.google.com/group/lojban?hl=3D=
en</a>.<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; References<br>
&gt; =A0 =A0 =A0&gt;<br>
&gt; =A0 =A0 =A0&gt; Visible links<br>
&gt; =A0 =A0 =A0&gt; 1. mailto:[11]<a href=3D"mailto:lukeabergen@gmail.com"=
>lukeabergen@gmail.com</a><br>
&gt; =A0 =A0 =A0&gt; 2. mailto:[12]<a href=3D"mailto:alyn.post@lodockikumaz=
vati.org">alyn.post@lodockikumazvati.org</a><br>
&gt; =A0 =A0 =A0&gt; 3. mailto:[13]<a href=3D"mailto:lojban@googlegroups.co=
m">lojban@googlegroups.com</a><br>
&gt; =A0 =A0 =A0&gt; 4. mailto:[14]<a href=3D"mailto:lojban%252Bunsubscribe=
@googlegroups.com">lojban%2Bunsubscribe@googlegroups.com</a><br>
&gt; =A0 =A0 =A0&gt; 5. [15]<a href=3D"http://groups.google.com/group/lojba=
n?hl=3Den" target=3D"_blank">http://groups.google.com/group/lojban?hl=3Den<=
/a><br>
<div class=3D"im">&gt; =A0 =A0 =A0--<br>
&gt; =A0 =A0 =A0.i ko djuno fi le do sevzi<br>
&gt;<br>
&gt; =A0 =A0 =A0--<br>
&gt; =A0 =A0 =A0You received this message because you are subscribed to the=
 Google<br>
&gt; =A0 =A0 =A0Groups &quot;lojban&quot; group.<br>
</div>&gt; =A0 =A0 =A0To post to this group, send email to [16]<a href=3D"m=
ailto:lojban@googlegroups.com">lojban@googlegroups.com</a>.<br>
<div class=3D"im">&gt; =A0 =A0 =A0To unsubscribe from this group, send emai=
l to<br>
</div>&gt; =A0 =A0 =A0[17]<a href=3D"mailto:lojban%2Bunsubscribe@googlegrou=
ps.com">lojban+unsubscribe@googlegroups.com</a>.<br>
<div class=3D"im">&gt; =A0 =A0 =A0For more options, visit this group at<br>
</div>&gt; =A0 =A0 =A0[18]<a href=3D"http://groups.google.com/group/lojban?=
hl=3Den" target=3D"_blank">http://groups.google.com/group/lojban?hl=3Den</a=
>.<br>
<div class=3D"im">&gt;<br>
&gt; =A0 =A0--<br>
&gt; =A0 =A0You received this message because you are subscribed to the Goo=
gle Groups<br>
&gt; =A0 =A0&quot;lojban&quot; group.<br>
&gt; =A0 =A0To post to this group, send email to <a href=3D"mailto:lojban@g=
ooglegroups.com">lojban@googlegroups.com</a>.<br>
&gt; =A0 =A0To unsubscribe from this group, send email to<br>
&gt; =A0 =A0<a href=3D"mailto:lojban%2Bunsubscribe@googlegroups.com">lojban=
+unsubscribe@googlegroups.com</a>.<br>
&gt; =A0 =A0For more options, visit this group at<br>
&gt; =A0 =A0<a href=3D"http://groups.google.com/group/lojban?hl=3Den" targe=
t=3D"_blank">http://groups.google.com/group/lojban?hl=3Den</a>.<br>
&gt;<br>
&gt; References<br>
&gt;<br>
&gt; =A0 =A0Visible links<br>
</div>&gt; =A0 =A01. mailto:<a href=3D"mailto:alyn.post@lodockikumazvati.or=
g">alyn.post@lodockikumazvati.org</a><br>
&gt; =A0 =A02. <a href=3D"http://www.lojban.org/tiki/tiki-index.php?page=3D=
BPFK+Section%3A+PEG+Morphology+Algorithm" target=3D"_blank">http://www.lojb=
an.org/tiki/tiki-index.php?page=3DBPFK+Section%3A+PEG+Morphology+Algorithm<=
/a><br>

<div class=3D"im">&gt; =A0 =A03. mailto:<a href=3D"mailto:lukeabergen@gmail=
.com">lukeabergen@gmail.com</a><br>
&gt; =A0 =A04. mailto:<a href=3D"mailto:alyn.post@lodockikumazvati.org">aly=
n.post@lodockikumazvati.org</a><br>
&gt; =A0 =A05. mailto:<a href=3D"mailto:lojban@googlegroups.com">lojban@goo=
glegroups.com</a><br>
&gt; =A0 =A06. mailto:<a href=3D"mailto:lojban%252Bunsubscribe@googlegroups=
.com">lojban%2Bunsubscribe@googlegroups.com</a><br>
</div>&gt; =A0 =A07. <a href=3D"http://groups.google.com/group/lojban?hl=3D=
en" target=3D"_blank">http://groups.google.com/group/lojban?hl=3Den</a><br>
<div class=3D"im">&gt; =A0 =A08. mailto:<a href=3D"mailto:lojban@googlegrou=
ps.com">lojban@googlegroups.com</a><br>
&gt; =A0 =A09. mailto:<a href=3D"mailto:lojban%252Bunsubscribe@googlegroups=
.com">lojban%2Bunsubscribe@googlegroups.com</a><br>
</div>&gt; =A0 10. <a href=3D"http://groups.google.com/group/lojban?hl=3Den=
" target=3D"_blank">http://groups.google.com/group/lojban?hl=3Den</a><br>
&gt; =A0 11. mailto:<a href=3D"mailto:lukeabergen@gmail.com">lukeabergen@gm=
ail.com</a><br>
&gt; =A0 12. mailto:<a href=3D"mailto:alyn.post@lodockikumazvati.org">alyn.=
post@lodockikumazvati.org</a><br>
&gt; =A0 13. mailto:<a href=3D"mailto:lojban@googlegroups.com">lojban@googl=
egroups.com</a><br>
&gt; =A0 14. mailto:<a href=3D"mailto:lojban%25252Bunsubscribe@googlegroups=
.com">lojban%252Bunsubscribe@googlegroups.com</a><br>
&gt; =A0 15. <a href=3D"http://groups.google.com/group/lojban?hl=3Den" targ=
et=3D"_blank">http://groups.google.com/group/lojban?hl=3Den</a><br>
&gt; =A0 16. mailto:<a href=3D"mailto:lojban@googlegroups.com">lojban@googl=
egroups.com</a><br>
&gt; =A0 17. mailto:<a href=3D"mailto:lojban%252Bunsubscribe@googlegroups.c=
om">lojban%2Bunsubscribe@googlegroups.com</a><br>
&gt; =A0 18. <a href=3D"http://groups.google.com/group/lojban?hl=3Den" targ=
et=3D"_blank">http://groups.google.com/group/lojban?hl=3Den</a><br>
<div><div></div><div class=3D"h5"><br>
--<br>
.i ko djuno fi le do sevzi<br>
<br>
--<br>
You received this message because you are subscribed to the Google Groups &=
quot;lojban&quot; group.<br>
To post to this group, send email to <a href=3D"mailto:lojban@googlegroups.=
com">lojban@googlegroups.com</a>.<br>
To unsubscribe from this group, send email to <a href=3D"mailto:lojban%2Bun=
subscribe@googlegroups.com">lojban+unsubscribe@googlegroups.com</a>.<br>
For more options, visit this group at <a href=3D"http://groups.google.com/g=
roup/lojban?hl=3Den" target=3D"_blank">http://groups.google.com/group/lojba=
n?hl=3Den</a>.<br>
<br>
</div></div></blockquote></div><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "=
lojban" group.<br />
To post to this group, send email to lojban@googlegroups.com.<br />
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou=
ps.com.<br />

For more options, visit this group at http://groups.google.com/group/lojban=
?hl=3Den.<br />



--0022152d6e5932b9fe0493c594a9--