From phma@ixazon.dynip.com Mon Jan 06 06:48:42 2003 Received: with ECARTIS (v1.0.0; list lojban-list); Mon, 06 Jan 2003 06:48:42 -0800 (PST) Received: from 208-150-110-21-adsl.precisionet.net ([208.150.110.21] helo=blackcat.ixazon.lan) by digitalkingdom.org with esmtp (Exim 4.05) id 18VYYI-0004Z8-00 for lojban-list@lojban.org; Mon, 06 Jan 2003 06:48:30 -0800 Received: by blackcat.ixazon.lan (Postfix, from userid 1001) id 2599224DB; Mon, 6 Jan 2003 14:48:04 +0000 (UTC) From: Pierre Abbat Organization: dis To: "'lojban-list@lojban.org'" Subject: [lojban] Bug in word break algorithm Date: Mon, 6 Jan 2003 09:48:03 -0500 User-Agent: KMail/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Message-Id: <200301060948.03611.phma@webjockey.net> X-archive-position: 3712 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: phma@webjockey.net Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list 3] If the piece we have left starts with a vowel, find the first consonant. If the first consonant is part of a consonant cluster (only CC-form this time), and this consonant cluster is NOT a valid initial cluster (with each adjacent pair of consonants is a valid initial pair), then we can resolve the entire piece as a le'avla (e.g. /antipAsto/); otherwise (if the first consonant is NOT part of a consonant cluster, or the consonant cluster IS a valid initial cluster), break off before the first consonant as a cmavo (e.g. /a'ofArlu/ becomes /a'o/ = cmavo + /fArlu/ = unresolved; or, /aismAcu/ becomes /ai/ = cmavo + /smAcu/ = unresolved). This gives the wrong answer if the part after the vowel is a slinku'i, for example /esKRIma/. How can I recognize a slinku'i by the front-middle method or something similar? phma