From phma@webjockey.net Fri Jan 24 18:21:40 2003
Return-Path: <phma@ixazon.dynip.com>
X-Sender: phma@ixazon.dynip.com
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-8_2_3_0); 25 Jan 2003 02:21:40 -0000
Received: (qmail 82396 invoked from network); 25 Jan 2003 02:21:39 -0000
Received: from unknown (66.218.66.218)
  by m3.grp.scd.yahoo.com with QMQP; 25 Jan 2003 02:21:39 -0000
Received: from unknown (HELO blackcat.ixazon.lan) (208.150.110.21)
  by mta3.grp.scd.yahoo.com with SMTP; 25 Jan 2003 02:21:39 -0000
Received: by blackcat.ixazon.lan (Postfix, from userid 1001)
  id 90E87A5AC; Sat, 25 Jan 2003 02:21:37 +0000 (UTC)
Organization: dis
To: lojban@yahoogroups.com
Subject: Re: [lojban] Re: valfendi algorithm
Date: Fri, 24 Jan 2003 21:21:36 -0500
User-Agent: KMail/1.5
References: <5.2.0.9.0.20030124074752.0360aec0@pop.east.cox.net> <5.2.0.9.0.20030124202537.03d9ab60@pop.east.cox.net>
In-Reply-To: <5.2.0.9.0.20030124202537.03d9ab60@pop.east.cox.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200301242121.36960.phma@webjockey.net>
From: Pierre Abbat <phma@webjockey.net>
X-Yahoo-Group-Post: member; u=92712300

On Friday 24 January 2003 20:31, Robert LeChevalier wrote:
> My understanding of:
> >A slinku'i, as far as word breaking is concerned, is anything that matches
> >the following regex:
> >^C[raf3]*([gim]?$|[raf4]?y)
> >where
> >C matches any consonant
> >[raf3] matches any 3-letter rafsi
> >[raf4] matches any 4-letter rafsi
> >[gim] matches any gismu.
>
> A correct algorithm would use the structures CVC/CVV/CCV for raf3,
> CVCC/CCVC for raf4 and CVCCV/CCVCV for gim. It doesn't matter whether the
> values are in fact actually used. Post-freeze it seems logical that it
> would and should be easier to add and subtract from the gismu/rafsi lists
> than to change the entire morphology, so the morphology is defined at a
> higher level than the specific list of words.

The program matches the structures, not a list of words, and I meant the 
algorithm to do so also. If the algorithm is unclear, check the program. If 
they disagree, tell me. I will use a list of words when I write the part that 
analyzes a lujvo into rafsi and looks them up; if a rafsi is not in the list 
it will say "?", e.g. {zbekyxoxmau} will be analyzed as {zbek? ? zmadu}.

> (In addition "ala'um" is not an "option"; there should be no options in an
> official algorithm. It is either valid or invalid according to the rules.)

The Book is gricingly unclear about this detail:

Names are not permitted to have the sequences ``la'', ``lai'', or ``doi'' 
embedded in them, unless the sequence is immediately preceded by a consonant.

Since anything that contains the sequence "lai" contains the sequence "la", 
and following "la" or "lai" with a vowel makes it unbreakable just as 
preceding it with a consonant does, I griced it to mean "...preceded by a 
consonant or followed by a vowel". But if that were the case, why isn't 
"la'i" mentioned? A few lines later it says "No cmene may have the syllables 
``la'', ``lai'', or ``doi'' in them, unless immediately preceded by a 
consonant." In {laus}, "la" is a sequence, but not a syllable. In {la,us}, it 
is both a sequence and a syllable. But the presence or absence of commas in a 
word makes no difference to the identity or validity of the word. So is that 
valid or not? {laus} cannot be broken into {la ,us}, nor {ala'um} into {a la 
'um}, because a word cannot begin with an apostrophe or with a pauseless 
vowel.

phma