From rlpowell@digitalkingdom.org Wed Apr 24 15:59:18 2002
Return-Path: <rlpowell@digitalkingdom.org>
X-Sender: rlpowell@digitalkingdom.org
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-8_0_3_1); 24 Apr 2002 22:59:18 -0000
Received: (qmail 69516 invoked from network); 24 Apr 2002 22:59:18 -0000
Received: from unknown (66.218.66.218)
  by m5.grp.scd.yahoo.com with QMQP; 24 Apr 2002 22:59:18 -0000
Received: from unknown (HELO chain.digitalkingdom.org) (216.231.54.78)
  by mta3.grp.scd.yahoo.com with SMTP; 24 Apr 2002 22:59:17 -0000
Received: from rlpowell by chain.digitalkingdom.org with local (Exim 3.35 #1 (Debian))
  id 170Vk0-0005CQ-00
  for <lojban@yahoogroups.com>; Wed, 24 Apr 2002 16:00:00 -0700
Date: Wed, 24 Apr 2002 16:00:00 -0700
To: lojban@yahoogroups.com
Subject: Re: [lojban] cmavo frequency list
Message-ID: <20020424230000.GY28651@digitalkingdom.org>
Mail-Followup-To: lojban@yahoogroups.com
References: <20020424002708.GA3992@twcny.rr.com> <Pine.GSO.4.40.0204232025580.16634-100000@ucsub.colorado.edu> <20020424045929.GB4465@twcny.rr.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20020424045929.GB4465@twcny.rr.com>
User-Agent: Mutt/1.3.28i
From: Robin Lee Powell <rlpowell@digitalkingdom.org>
X-Yahoo-Group-Post: member; u=66827819
X-Yahoo-Profile: robinleepowell

On Wed, Apr 24, 2002 at 12:59:29AM -0400, Rob Speer wrote:
> On Tue, Apr 23, 2002 at 08:32:27PM -0600, Jay Kominek wrote:
> > 
> > On Tue, 23 Apr 2002, Rob Speer wrote:
> > Out of curiousity, are you using jbofi'e or vlatai or something
> > along those lines to handle the lexing?
> 
> No. It would probably be better if I did, but right now I match
> against this regular expression to determine whether a word is a cmavo
> (or cmavo compound):
> 
> ^([bcdfgjklmnprstvxz\.]?[aeiou]'?[aeiou]*)+\.?$

I assume the text is broken into words first?

> > And, have you considered trying to include the IRC channel logs?
> 
> I considered it. Where could I get them?

I can send them to you.

> The problem there is that I'd need some way to distinguish Lojban text
> from English.

Erk. You'd probably have to weed through it by hand...

Certainly you could grep in and out a lot of it...

-Robin

-- 
http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest.
le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno
je xlali -- RLP http://www.lojban.org/

