[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] lujvo deconstruction
I think your message here contains the kernel of the solution,
namely that you can't just chop off three letters and call that a
rafsi, but you must grab three (four) letters that form a valid
rafsi or it isn't one.
The PEG grammar for Lojban morphology:
http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm
Shows what makes a valid Lujvo, and the process is more subtle than
"grab three letters and pretend they're a rafsi." But if you follow
the formal grammar, you'll get an abstract syntax tree that fully
delimits each piece of the lujvo.
-Alan
On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote:
> Actually I guess that was a bad example at the end because a lujvo ending
> with "rat" would definitely be wrong. But you get where I'm going with it.
>
> On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <[1]lukeabergen@gmail.com>
> wrote:
>
> Sorry, yes, I was providing very rough pseudocode for my script. I do
> look from left to right. But since rafsi are always 3 letters (minus any
> ' characters and excluding 4 letter rafsi), I take them in chunks of 3.
> an example with morsi would be "xamymro". My code would go like:
> grab left most three chars, check for .y'ys and grab a fourth char if
> there is a .y'y
> look up the rafsi, chop off what you found to be the "leftmost" rafsi
> and loop again with what you have left
> Now we're looking at "ymro"
> Strip off "y" and we're left with "mro". Now because I'm assuming that
> "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see
> "mro" and think "ok, the 'm' is a buffer vowel so grab another char so
> we're back to a 3 letter rafsi", I then try to grab whatever comes after
> "o" and get a null-pointer or some such.
> It just occurred to me that I might deal with 4 letter rafsi by keeping
> in mind that they always end with "y". So my revised "grab leftmost
> rafsi" code would look something like:
> word = xajmymro
> if (word = "....y") // where this is "word" = any 4 characters followed
> by an "y"
> return substring(word, 0, 4)
> Then in the calling function I just have to look for gismu of the form
> rafsi+a, rafsi+e, etc... till I find one that matches a gismu.
> I'm still stuck on the buffer consonant problem though.
> It feels wrong to use guesswork like "if you see [r|l|m|n]C then check
> to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab
> another char from the right, and look THAT up and see if it's a rafsi".
> Here's a non-code way to think of the problem. How would a parser figure
> out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat
> ro ci"?
> On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post.
> <[2]alyn.post@lodockikumazvati.org> wrote:
>
> On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
> > When I first started learning lojban I wrote up a quick'n dirty
> script to
> > make looking up words faster and easier. gismu and cmavo were easy,
> but I
> > could never figure out lujvo. So I'm taking another stab at it. I
> > currently have something that works in the general cases of
> {bajdri},
> > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal
> with 4
> > letter rafsi and non "y" buffer letters.
> > To deal with the non "y" buffer letters I thought I could just say:
> > strip all "y" from the word
> > get first three non "'" chars
> > if the first letter is "r", "l", "m", or "n" and the second letter
> is a
> > consonant, then chop off the first letter and grab another letter
> from the
> > right
> > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after
> > handling ba'u in the first iteration) end up with "rbe" and due to
> the
> > above step, I'd strip off the "r" and grab the next letter thus
> ending
> > with "bei" which is the right result).
> > But this produces strange results because there ARE cases where
> buffer
> > letters are followed by consonants (morsi for instance).
> > Is there a way to un-ambiguously and algorithmically break a lujvo
> down
> > into its component gismu?
> >
>
> I haven't rigorously looked at this, so please excuse me if I'm way
> off base.
>
> What if you start at the left side of the word and match characters
> until you get a matching rafsi, then look for optional buffer
> characters before matching your next rafsi, &c? You could be much
> more sophisticated by adding detection for valid lerfu clustering
> to throw out what would otherwise be an ambiguous case.
>
> It sounds like you're working top down on the problem rather than
> going from left to right, but I don't know what is wrong with my
> suggestion yet.
>
> I see you've provided 3 simple examples, but can you provide an
> example for morsi which you mention at the end?
>
> -Alan
> --
> .i ko djuno fi le do sevzi
> --
> You received this message because you are subscribed to the Google
> Groups "lojban" group.
> To post to this group, send email to [3]lojban@googlegroups.com.
> To unsubscribe from this group, send email to
> [4]lojban+unsubscribe@googlegroups.com.
> For more options, visit this group at
> [5]http://groups.google.com/group/lojban?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To post to this group, send email to lojban@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
> References
>
> Visible links
> 1. mailto:lukeabergen@gmail.com
> 2. mailto:alyn.post@lodockikumazvati.org
> 3. mailto:lojban@googlegroups.com
> 4. mailto:lojban%2Bunsubscribe@googlegroups.com
> 5. http://groups.google.com/group/lojban?hl=en
--
.i ko djuno fi le do sevzi
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.