From rlpowell@digitalkingdom.org Wed Apr 24 15:59:18 2002 Return-Path: X-Sender: rlpowell@digitalkingdom.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_0_3_1); 24 Apr 2002 22:59:18 -0000 Received: (qmail 69516 invoked from network); 24 Apr 2002 22:59:18 -0000 Received: from unknown (66.218.66.218) by m5.grp.scd.yahoo.com with QMQP; 24 Apr 2002 22:59:18 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (216.231.54.78) by mta3.grp.scd.yahoo.com with SMTP; 24 Apr 2002 22:59:17 -0000 Received: from rlpowell by chain.digitalkingdom.org with local (Exim 3.35 #1 (Debian)) id 170Vk0-0005CQ-00 for ; Wed, 24 Apr 2002 16:00:00 -0700 Date: Wed, 24 Apr 2002 16:00:00 -0700 To: lojban@yahoogroups.com Subject: Re: [lojban] cmavo frequency list Message-ID: <20020424230000.GY28651@digitalkingdom.org> Mail-Followup-To: lojban@yahoogroups.com References: <20020424002708.GA3992@twcny.rr.com> <20020424045929.GB4465@twcny.rr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020424045929.GB4465@twcny.rr.com> User-Agent: Mutt/1.3.28i From: Robin Lee Powell X-Yahoo-Group-Post: member; u=66827819 X-Yahoo-Profile: robinleepowell On Wed, Apr 24, 2002 at 12:59:29AM -0400, Rob Speer wrote: > On Tue, Apr 23, 2002 at 08:32:27PM -0600, Jay Kominek wrote: > > > > On Tue, 23 Apr 2002, Rob Speer wrote: > > Out of curiousity, are you using jbofi'e or vlatai or something > > along those lines to handle the lexing? > > No. It would probably be better if I did, but right now I match > against this regular expression to determine whether a word is a cmavo > (or cmavo compound): > > ^([bcdfgjklmnprstvxz\.]?[aeiou]'?[aeiou]*)+\.?$ I assume the text is broken into words first? > > And, have you considered trying to include the IRC channel logs? > > I considered it. Where could I get them? I can send them to you. > The problem there is that I'd need some way to distinguish Lojban text > from English. Erk. You'd probably have to weed through it by hand... Certainly you could grep in and out a lot of it... -Robin -- http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest. le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno je xlali -- RLP http://www.lojban.org/