[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Re: valfendi algorithm



On Friday 24 January 2003 20:31, Robert LeChevalier wrote:
> My understanding of:
> >A slinku'i, as far as word breaking is concerned, is anything that matches
> >the following regex:
> >^C[raf3]*([gim]?$|[raf4]?y)
> >where
> >C matches any consonant
> >[raf3] matches any 3-letter rafsi
> >[raf4] matches any 4-letter rafsi
> >[gim] matches any gismu.
>
> A correct algorithm would use the structures CVC/CVV/CCV for raf3,
> CVCC/CCVC for raf4 and CVCCV/CCVCV for gim.  It doesn't matter whether the
> values are in fact actually used.  Post-freeze it seems logical that it
> would and should be easier to add and subtract from the gismu/rafsi lists
> than to change the entire morphology, so the morphology is defined at a
> higher level than the specific list of words.

The program matches the structures, not a list of words, and I meant the 
algorithm to do so also. If the algorithm is unclear, check the program. If 
they disagree, tell me. I will use a list of words when I write the part that 
analyzes a lujvo into rafsi and looks them up; if a rafsi is not in the list 
it will say "?", e.g. {zbekyxoxmau} will be analyzed as {zbek? ? zmadu}.

> (In addition "ala'um" is not an "option"; there should be no options in an
> official algorithm.  It is either valid or invalid according to the rules.)

The Book is gricingly unclear about this detail:

 Names are not permitted to have the sequences ``la'', ``lai'', or ``doi'' 
embedded in them, unless the sequence is immediately preceded by a consonant.

Since anything that contains the sequence "lai" contains the sequence "la", 
and following "la" or "lai" with a vowel makes it unbreakable just as 
preceding it with a consonant does, I griced it to mean "...preceded by a 
consonant or followed by a vowel". But if that were the case, why isn't 
"la'i" mentioned? A few lines later it says "No cmene may have the syllables 
``la'', ``lai'', or ``doi'' in them, unless immediately preceded by a 
consonant." In {laus}, "la" is a sequence, but not a syllable. In {la,us}, it 
is both a sequence and a syllable. But the presence or absence of commas in a 
word makes no difference to the identity or validity of the word. So is that 
valid or not? {laus} cannot be broken into {la ,us}, nor {ala'um} into {a la 
'um}, because a word cannot begin with an apostrophe or with a pauseless 
vowel.

phma