From seidensticker@msn.com Sun Mar 18 10:03:14 2001 Return-Path: X-Sender: seidensticker@msn.com X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_0_4); 18 Mar 2001 18:03:13 -0000 Received: (qmail 2502 invoked from network); 18 Mar 2001 18:03:12 -0000 Received: from unknown (10.1.10.26) by l10.egroups.com with QMQP; 18 Mar 2001 18:03:12 -0000 Received: from unknown (HELO mq.egroups.com) (10.1.1.36) by mta1 with SMTP; 18 Mar 2001 18:03:12 -0000 X-eGroups-Return: seidensticker@msn.com Received: from [10.1.10.117] by mq.egroups.com with NNFMP; 18 Mar 2001 18:03:12 -0000 Date: Sun, 18 Mar 2001 18:03:08 -0000 To: lojban@yahoogroups.com Subject: Breaking up compound cmavo Message-ID: <992t8s+mls7@eGroups.com> User-Agent: eGroups-EW/0.82 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Length: 541 X-Mailer: eGroups Message Poster X-Originating-IP: 206.129.86.130 From: seidensticker@msn.com I'm trying to figure out how to divide cmavo that have been stuck together. For example, consider co'omi'e. The approach I'd taken was to compare the word against a sorted cmavo list, increasing the size of the extracted token character by character until I found an exact match. The problem with this is that after extracting "co", I'd have found a match and then would try to make sense out of "'omi'e" -- without success. I'm assuming that there's a simple algorithm to parse these. Can someone point me to it? Thanks. Bob