[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] Re: valfendi algorithm
On Friday 24 January 2003 20:31, Robert LeChevalier wrote:
> My understanding of:
> >A slinku'i, as far as word breaking is concerned, is anything that matches
> >the following regex:
> >^C[raf3]*([gim]?$|[raf4]?y)
> >where
> >C matches any consonant
> >[raf3] matches any 3-letter rafsi
> >[raf4] matches any 4-letter rafsi
> >[gim] matches any gismu.
>
> A correct algorithm would use the structures CVC/CVV/CCV for raf3,
> CVCC/CCVC for raf4 and CVCCV/CCVCV for gim. It doesn't matter whether the
> values are in fact actually used. Post-freeze it seems logical that it
> would and should be easier to add and subtract from the gismu/rafsi lists
> than to change the entire morphology, so the morphology is defined at a
> higher level than the specific list of words.
The program matches the structures, not a list of words, and I meant the
algorithm to do so also. If the algorithm is unclear, check the program. If
they disagree, tell me. I will use a list of words when I write the part that
analyzes a lujvo into rafsi and looks them up; if a rafsi is not in the list
it will say "?", e.g. {zbekyxoxmau} will be analyzed as {zbek? ? zmadu}.
> (In addition "ala'um" is not an "option"; there should be no options in an
> official algorithm. It is either valid or invalid according to the rules.)
The Book is gricingly unclear about this detail:
Names are not permitted to have the sequences ``la'', ``lai'', or ``doi''
embedded in them, unless the sequence is immediately preceded by a consonant.
Since anything that contains the sequence "lai" contains the sequence "la",
and following "la" or "lai" with a vowel makes it unbreakable just as
preceding it with a consonant does, I griced it to mean "...preceded by a
consonant or followed by a vowel". But if that were the case, why isn't
"la'i" mentioned? A few lines later it says "No cmene may have the syllables
``la'', ``lai'', or ``doi'' in them, unless immediately preceded by a
consonant." In {laus}, "la" is a sequence, but not a syllable. In {la,us}, it
is both a sequence and a syllable. But the presence or absence of commas in a
word makes no difference to the identity or validity of the word. So is that
valid or not? {laus} cannot be broken into {la ,us}, nor {ala'um} into {a la
'um}, because a word cannot begin with an apostrophe or with a pauseless
vowel.
phma