[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] lujvo deconstruction



Sorry, yes, I was providing very rough pseudocode for my script.  I do look from left to right.  But since rafsi are always 3 letters (minus any ' characters and excluding 4 letter rafsi), I take them in chunks of 3.

an example with morsi would be "xamymro".  My code would go like:
grab left most three chars, check for .y'ys and grab a fourth char if there is a .y'y
look up the rafsi, chop off what you found to be the "leftmost" rafsi and loop again with what you have left
Now we're looking at "ymro"
Strip off "y" and we're left with "mro".  Now because I'm assuming that "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see "mro" and think "ok, the 'm' is a buffer vowel so grab another char so we're back to a 3 letter rafsi", I then try to grab whatever comes after "o" and get a null-pointer or some such.

It just occurred to me that I might deal with 4 letter rafsi by keeping in mind that they always end with "y".  So my revised "grab leftmost rafsi" code would look something like:

word = xajmymro
if (word = "....y") // where this is "word" = any 4 characters followed by an "y"
  return substring(word, 0, 4)

Then in the calling function I just have to look for gismu of the form rafsi+a, rafsi+e, etc... till I find one that matches a gismu.

I'm still stuck on the buffer consonant problem though.

It feels wrong to use guesswork like "if you see [r|l|m|n]C then check to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab another char from the right, and look THAT up and see if it's a rafsi".

Here's a non-code way to think of the problem.  How would a parser figure out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat ro ci"?

On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. <alyn.post@lodockikumazvati.org> wrote:
On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote:
>    When I first started learning lojban I wrote up a quick'n dirty script to
>    make looking up words faster and easier. gismu and cmavo were easy, but I
>    could never figure out lujvo. So I'm taking another stab at it. I
>    currently have something that works in the general cases of {bajdri},
>    {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with 4
>    letter rafsi and non "y" buffer letters.
>    To deal with the non "y" buffer letters I thought I could just say:
>    strip all "y" from the word
>    get first three non "'" chars
>    if the first letter is "r", "l", "m", or "n" and the second letter is a
>    consonant, then chop off the first letter and grab another letter from the
>    right
>    (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after
>    handling ba'u in the first iteration) end up with "rbe" and due to the
>    above step, I'd strip off the "r" and grab the next letter thus ending
>    with "bei" which is the right result).
>    But this produces strange results because there ARE cases where buffer
>    letters are followed by consonants (morsi for instance).
>    Is there a way to un-ambiguously and algorithmically break a lujvo down
>    into its component gismu?
>

I haven't rigorously looked at this, so please excuse me if I'm way
off base.

What if you start at the left side of the word and match characters
until you get a matching rafsi, then look for optional buffer
characters before matching your next rafsi, &c?  You could be much
more sophisticated by adding detection for valid lerfu clustering
to throw out what would otherwise be an ambiguous case.

It sounds like you're working top down on the problem rather than
going from left to right, but I don't know what is wrong with my
suggestion yet.

I see you've provided 3 simple examples, but can you provide an
example for morsi which you mention at the end?

-Alan
--
.i ko djuno fi le do sevzi

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.