[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: letter frequencies

> Has anyone done any research into the frequencies of the letters
> (and digraphs, trigraphs, etc.) in written Lojban?

Here's part of a short article by lojbab, originally printed in JL9:

Scrabble (TM) for Lojban

JCB first wrote up some rules for playing SCRABBLE (TM) ... back in the
late 70's.  No one ever reported playing the game other than JCB....
When [the Logical Language Group] ... rebuilt the >gismu< list for the
Lojban version of the language, we also changed these frequencies.

Since we've now baselined the list of >gismu<, that portion of the frequencies
should not change.  The >cmavo< list isn't baselined, but we know that 90%
of all possible V, CV, VV, and CVV combinations have a meaning; the holes
are nearly random, except for the xVV >cmavo< reserved for experimental use.
So we can estimate the final letter frequences with some expectation of validity.

>lujvo< are harder to deal with, since there isn't even a list of them yet.
However, in developing and tuning the >rafsi< assignments, I had gathered
statistical data from several thousand >tanru< proposed over several years
by JCB, the old Word Maker's Council, Eaton project volunteers, contributors
to TL, and our own >gismu< list workers.  These data reflect no actual set
of words, only word proposals.  However, its statistical size suggests that
it can't be too far off in representing the eventual letter-frequency
distribution for Lojban....

[Note by JC:  Of course, these data do >not< reflect letter frequencies
in running text: they assume all words are equally probable, which is valid
for word-games but not valid for text that is actually about something.]

Letter	non-lujvo frequencies			with-lujvo frequencies
	occurs	Lojban/points	English/points	occurs	Lojban/points

'	316	4/1		0		1012	4/2
a	991	12/1		9/1		2949	10/1
b	212	2/5		2/3		865	2/4
c	360	4/2		2/3		1040	3/3
d	219	3/3		4/2		862	2/4
e	496	6/1		12/1		1560	5/1
f	149	2/5		2/4		616	2/4
g	146	2/5		3/2		589	2/4
h	0	0		2/4		0	0
i	1045	12/1		9/1		2678	10/1
j	249	3/3		1/8		1008	3/3
k	285	3/3		1/5		1107	3/3
l	348	4/2		4/1		1395	4/2
m	254	3/3		2/3		1048	3/3
n	563	7/1		6/1		2047	7/1
o	395	5/2		8/1		1046	4/2
p	203	2/5		2/3		872	2/4
q	0	0		1/10		0	0
r	460	5/1		6/1		2979	7/1
s	339	4/2		4/1		1363	4/2
t	361	4/2		6/1		1359	4/2
u	642	8/1		4/1		1755	6/1
v	119	1/8		2/4		490	1/6
w	0	0		2/4		0	0
x	108	1/9		1/8		532	1/6
y	19	0		2/4		5553	8/1
z	87	1/10		1/10		259	1/9
blank	0	2/0		2/0		0	2/0

Total		100/184		100/187			100/190