[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] cmavo frequency list



On Tue, Apr 23, 2002 at 08:32:27PM -0600, Jay Kominek wrote:
> 
> On Tue, 23 Apr 2002, Rob Speer wrote:
> 
> > I seem to remember that there is so far no accurate list of the
> > frequencies with which each cmavo is used.
> 
> Wee
> 
> > So I wrote a script which would search Lojban text for cmavo, even in
> > compounds, and count up the frequency for each one.
> 
> Out of curiousity, are you using jbofi'e or vlatai or something along
> those lines to handle the lexing?

No. It would probably be better if I did, but right now I match against
this regular expression to determine whether a word is a cmavo (or cmavo
compound):

^([bcdfgjklmnprstvxz\.]?[aeiou]'?[aeiou]*)+\.?$

I had to leave out cmavo with "y", because otherwise I'd get false
positives on lujvo like "ricyci'e".

> And, have you considered trying to include the IRC channel logs?

I considered it. Where could I get them?

The problem there is that I'd need some way to distinguish Lojban text
from English.

> > Another script found the 121 cmavo which were not used anywhere. Some of
> > these were expected (lau) while others were quite surprising that they
> > have gone unused (ro'e). And of course most of the MEX words are in
> > there, but they are important nonetheless.
> 
> I'd like to point out (for what little it is worth), that I've used the
> following:
> 
> ke'e ko'o
> ci'i mo'a
> ro'e ro'o

I'm sure some of these, especially ro'e and ro'o, have been used many
times - but their usage didn't make it into any finished text.

-- 
Rob Speer