[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Updated Letter Frequency Data



I've just generated new letter frequency data based on all but the
first section of:

http://www.teddyb.org/~rlpowell/hobbies/lojban/grammar/test_sentences.txt

So basically, the cLL, Alice, and a bunch of IRC.  If people would
like to suggest other non-trivially sized Lojban texts to add,
please let me know, but we've got ~650K characters here, so I think
the statistics is pretty good.

My data, sorted by number of occurences:

  85004 i
  68959 a
  52225 e
  50517 u
  47944 o
  43807 l
  36358 n
  33169 c
  27097 m
  24514 r
  22989 s
  21356 d
  20536 '
  18317 t
  17749 k
  14459 b
  13359 p
  11990 j
   8810 g
   8007 z
   6857 v
   6616 x
   6288 f
   4580 y

As ratios:

0.130472888242183 i
0.105845370809523 a
0.080160305261493 e
0.077538691065483 u
0.073589385839292 o
0.067239492438300 l
0.055806000549495 n
0.050911195121464 c
0.041591264560472 m
0.037626610305031 r
0.035285883344307 s
0.032779386867677 d
0.031520766469124 '
0.028114816878406 t
0.027242992016969 k
0.022193161393507 b
0.020504768175936 p
0.018403486071523 j
0.013522494769818 g
0.012289967720991 z
0.010524829357167 v
0.010154917752226 x
0.009651469592805 f
0.007029855396795 y

The only previous work on this I'm aware of is:

http://www.lojban.org/files/papers/scrabble.unf

Which, it turns out, is amazingly flawed (which is fine, because
that was a long time ago!).

Using the data without lujvo, we have:

i       1045
a       991 
u       642 
n       563 
e       496 
r       460 
o       395 
t       361 
c       360 
l       348 
s       339 
'       316 
k       285 
m       254 
j       249 
d       219 
b       212 
p       203 
f       149 
g       146 
v       119 
x       108 
z       87  
y       19  

which is only marginally different from what I have.

Using the data with lujvo, however, which IIRC is what the Scrabble
frequencies were based on, we have the obviously biased:

y     5553
r     2979
a     2949
i     2678
n     2047
u     1755
e     1560
l     1395
s     1363
t     1359
k     1107
m     1048
o     1046
c     1040
'     1012
j     1008
p     872
b     865
d     862
f     616
g     589
x     532
v     490
z     359

-Robin

-- 
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/