[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] lujvo deconstruction

To: lojban@googlegroups.com
Subject: Re: [lojban] lujvo deconstruction
From: ".alyn.post." <alyn.post@lodockikumazvati.org>
Date: Fri, 29 Oct 2010 11:44:11 -0600
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=7Y6nB3WD5AVLw4ZlpDaBSKs/0B1pcJv6GScRdyptYQM=; b=O/4J2nyU7jr/NpBN/79W5FNoeFmHxWyg9YAuRq+6Kd2itzDkzSsQK3MYU6Hh+P37o/ V7JWacnIJHVQn9SkOl7kW156v28SfUko4JKvVkyjFqEZTQaBu2o+6mhFCwmGQQHjVMxk 5NTNs4WCfq/EAtNZSpRT0noQLTuSLqhfAs/hc=
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=o4PhDaKL+KEQRYtSVM0gMo04AxA876vmOeIh/SeRGk3SJTtPvWCRqLCY2eyeLl4ICN pdRe1oBNrgonYX6x2Kd3dDzYAG75vtoXyT5SkCofCnoQjdnI1ua8WlTRzets2SrSNhUo KoudBqoRbMHUXRHS9ScEIzTkbpg3GYd9H3BJc=
In-reply-to: <AANLkTim4OyJoDtdJz_gopRdJrtg-4oYgZ1MgMBp0MLD+@mail.gmail.com>
List-archive: <http://groups.google.com/group/lojban?hl=en_US>
List-help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-id: <lojban.googlegroups.com>
List-post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Mail-followup-to: lojban@googlegroups.com
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
References: <AANLkTik2apwYUT40-wMWcd_Wjj4B4aERKNsHVq_MCf=P@mail.gmail.com> <20101029170344.GB47249@alice.local> <AANLkTimEdWEmcwzgGm6=Fq3tgguQ1K_0uff7MKb5aZLU@mail.gmail.com> <AANLkTim4OyJoDtdJz_gopRdJrtg-4oYgZ1MgMBp0MLD+@mail.gmail.com>
Reply-to: lojban@googlegroups.com
Sender: lojban@googlegroups.com

I think your message here contains the kernel of the solution,
namely that you can't just chop off three letters and call that a
rafsi, but you must grab three (four) letters that form a valid
rafsi or it isn't one.

The PEG grammar for Lojban morphology:

http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm

Shows what makes a valid Lujvo, and the process is more subtle than
"grab three letters and pretend they're a rafsi."  But if you follow
the formal grammar, you'll get an abstract syntax tree that fully
delimits each piece of the lujvo.

-Alan

On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote:
>    Actually I guess that was a bad example at the end because a lujvo ending
>    with "rat" would definitely be wrong. But you get where I'm going with it.
> 
>    On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <[1]lukeabergen@gmail.com>
>    wrote:
> 
>      Sorry, yes, I was providing very rough pseudocode for my script. I do
>      look from left to right. But since rafsi are always 3 letters (minus any
>      ' characters and excluding 4 letter rafsi), I take them in chunks of 3.
>      an example with morsi would be "xamymro". My code would go like:
>      grab left most three chars, check for .y'ys and grab a fourth char if
>      there is a .y'y
>      look up the rafsi, chop off what you found to be the "leftmost" rafsi
>      and loop again with what you have left
>      Now we're looking at "ymro"
>      Strip off "y" and we're left with "mro". Now because I'm assuming that
>      "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see
>      "mro" and think "ok, the 'm' is a buffer vowel so grab another char so
>      we're back to a 3 letter rafsi", I then try to grab whatever comes after
>      "o" and get a null-pointer or some such.
>      It just occurred to me that I might deal with 4 letter rafsi by keeping
>      in mind that they always end with "y". So my revised "grab leftmost
>      rafsi" code would look something like:
>      word = xajmymro
>      if (word = "....y") // where this is "word" = any 4 characters followed
>      by an "y"
>      return substring(word, 0, 4)
>      Then in the calling function I just have to look for gismu of the form
>      rafsi+a, rafsi+e, etc... till I find one that matches a gismu.
>      I'm still stuck on the buffer consonant problem though.
>      It feels wrong to use guesswork like "if you see [r|l|m|n]C then check
>      to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab
>      another char from the right, and look THAT up and see if it's a rafsi".
>      Here's a non-code way to think of the problem. How would a parser figure
>      out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat
>      ro ci"?
>      On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post.
>      <[2]alyn.post@lodockikumazvati.org> wrote:
> 
>        On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
>        > When I first started learning lojban I wrote up a quick'n dirty
>        script to
>        > make looking up words faster and easier. gismu and cmavo were easy,
>        but I
>        > could never figure out lujvo. So I'm taking another stab at it. I
>        > currently have something that works in the general cases of
>        {bajdri},
>        > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal
>        with 4
>        > letter rafsi and non "y" buffer letters.
>        > To deal with the non "y" buffer letters I thought I could just say:
>        > strip all "y" from the word
>        > get first three non "'" chars
>        > if the first letter is "r", "l", "m", or "n" and the second letter
>        is a
>        > consonant, then chop off the first letter and grab another letter
>        from the
>        > right
>        > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after
>        > handling ba'u in the first iteration) end up with "rbe" and due to
>        the
>        > above step, I'd strip off the "r" and grab the next letter thus
>        ending
>        > with "bei" which is the right result).
>        > But this produces strange results because there ARE cases where
>        buffer
>        > letters are followed by consonants (morsi for instance).
>        > Is there a way to un-ambiguously and algorithmically break a lujvo
>        down
>        > into its component gismu?
>        >
> 
>        I haven't rigorously looked at this, so please excuse me if I'm way
>        off base.
> 
>        What if you start at the left side of the word and match characters
>        until you get a matching rafsi, then look for optional buffer
>        characters before matching your next rafsi, &c? You could be much
>        more sophisticated by adding detection for valid lerfu clustering
>        to throw out what would otherwise be an ambiguous case.
> 
>        It sounds like you're working top down on the problem rather than
>        going from left to right, but I don't know what is wrong with my
>        suggestion yet.
> 
>        I see you've provided 3 simple examples, but can you provide an
>        example for morsi which you mention at the end?
> 
>        -Alan
>        --
>        .i ko djuno fi le do sevzi
>        --
>        You received this message because you are subscribed to the Google
>        Groups "lojban" group.
>        To post to this group, send email to [3]lojban@googlegroups.com.
>        To unsubscribe from this group, send email to
>        [4]lojban+unsubscribe@googlegroups.com.
>        For more options, visit this group at
>        [5]http://groups.google.com/group/lojban?hl=en.
> 
>    --
>    You received this message because you are subscribed to the Google Groups
>    "lojban" group.
>    To post to this group, send email to lojban@googlegroups.com.
>    To unsubscribe from this group, send email to
>    lojban+unsubscribe@googlegroups.com.
>    For more options, visit this group at
>    http://groups.google.com/group/lojban?hl=en.
> 
> References
> 
>    Visible links
>    1. mailto:lukeabergen@gmail.com
>    2. mailto:alyn.post@lodockikumazvati.org
>    3. mailto:lojban@googlegroups.com
>    4. mailto:lojban%2Bunsubscribe@googlegroups.com
>    5. http://groups.google.com/group/lojban?hl=en

-- 
.i ko djuno fi le do sevzi

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.

Follow-Ups:
- Re: [lojban] lujvo deconstruction
  - From: Luke Bergen <lukeabergen@gmail.com>

References:
- [lojban] lujvo deconstruction
  - From: Luke Bergen <lukeabergen@gmail.com>
- Re: [lojban] lujvo deconstruction
  - From: ".alyn.post." <alyn.post@lodockikumazvati.org>
- Re: [lojban] lujvo deconstruction
  - From: Luke Bergen <lukeabergen@gmail.com>
- Re: [lojban] lujvo deconstruction
  - From: Luke Bergen <lukeabergen@gmail.com>

Prev by Date: Re: [lojban] lujvo deconstruction
Next by Date: Re: [lojban] Fw: xagrai and superlatives
Previous by thread: Re: [lojban] lujvo deconstruction
Next by thread: Re: [lojban] lujvo deconstruction
Index(es):
- Date
- Thread