[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] cmavo frequency list



On Wed, Apr 24, 2002 at 12:59:29AM -0400, Rob Speer wrote:
> On Tue, Apr 23, 2002 at 08:32:27PM -0600, Jay Kominek wrote:
> > 
> > On Tue, 23 Apr 2002, Rob Speer wrote:
> > Out of curiousity, are you using jbofi'e or vlatai or something
> > along those lines to handle the lexing?
> 
> No. It would probably be better if I did, but right now I match
> against this regular expression to determine whether a word is a cmavo
> (or cmavo compound):
> 
> ^([bcdfgjklmnprstvxz\.]?[aeiou]'?[aeiou]*)+\.?$

I assume the text is broken into words first?

> > And, have you considered trying to include the IRC channel logs?
> 
> I considered it. Where could I get them?

I can send them to you.

> The problem there is that I'd need some way to distinguish Lojban text
> from English.

Erk.  You'd probably have to weed through it by hand...

Certainly you could grep in and out a lot of it...

-Robin

-- 
http://www.digitalkingdom.org/~rlpowell/ 	BTW, I'm male, honest.
le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno
je xlali -- RLP 				http://www.lojban.org/