From phma@webjockey.net Mon Jan 06 06:49:22 2003 Return-Path: X-Sender: lojban-out@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_2_3_0); 6 Jan 2003 14:49:22 -0000 Received: (qmail 99905 invoked from network); 6 Jan 2003 14:49:22 -0000 Received: from unknown (66.218.66.217) by m5.grp.scd.yahoo.com with QMQP; 6 Jan 2003 14:49:22 -0000 Received: from unknown (HELO digitalkingdom.org) (204.152.186.175) by mta2.grp.scd.yahoo.com with SMTP; 6 Jan 2003 14:49:22 -0000 Received: from lojban-out by digitalkingdom.org with local (Exim 4.05) id 18VYZ8-0004fK-00 for lojban@yahoogroups.com; Mon, 06 Jan 2003 06:49:22 -0800 Received: from digitalkingdom.org ([204.152.186.175] helo=chain) by digitalkingdom.org with esmtp (Exim 4.05) id 18VYYV-0004c9-00; Mon, 06 Jan 2003 06:48:43 -0800 Received: with ECARTIS (v1.0.0; list lojban-list); Mon, 06 Jan 2003 06:48:42 -0800 (PST) Received: from 208-150-110-21-adsl.precisionet.net ([208.150.110.21] helo=blackcat.ixazon.lan) by digitalkingdom.org with esmtp (Exim 4.05) id 18VYYI-0004Z8-00 for lojban-list@lojban.org; Mon, 06 Jan 2003 06:48:30 -0800 Received: by blackcat.ixazon.lan (Postfix, from userid 1001) id 2599224DB; Mon, 6 Jan 2003 14:48:04 +0000 (UTC) Organization: dis To: "'lojban-list@lojban.org'" Subject: [lojban] Bug in word break algorithm Date: Mon, 6 Jan 2003 09:48:03 -0500 User-Agent: KMail/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Message-Id: <200301060948.03611.phma@webjockey.net> X-archive-position: 3712 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: phma@webjockey.net Precedence: bulk X-list: lojban-list From: Pierre Abbat Reply-To: phma@webjockey.net X-Yahoo-Group-Post: member; u=92712300 3] If the piece we have left starts with a vowel, find the first consonant. If the first consonant is part of a consonant cluster (only CC-form this time), and this consonant cluster is NOT a valid initial cluster (with each adjacent pair of consonants is a valid initial pair), then we can resolve the entire piece as a le'avla (e.g. /antipAsto/); otherwise (if the first consonant is NOT part of a consonant cluster, or the consonant cluster IS a valid initial cluster), break off before the first consonant as a cmavo (e.g. /a'ofArlu/ becomes /a'o/ = cmavo + /fArlu/ = unresolved; or, /aismAcu/ becomes /ai/ = cmavo + /smAcu/ = unresolved). This gives the wrong answer if the part after the vowel is a slinku'i, for example /esKRIma/. How can I recognize a slinku'i by the front-middle method or something similar? phma