Re: Lojban Certification Program

On Fri, Sep 18, 2009 at 2:30 PM, Matt Arnold <matt.mattarn@gmail.com> wrote:
> I think the question is whether to use the most common 500 words, or
> weight it in favor of cmavo.

In favor or against cmavo? I think it is cmavo that are
overrepresented in the initial segment. In the first 100 words there
are only 18 gismu. It's pretty hard to construct sentences which use
82 cmavo but are constrained to only 18 gismu.

> I still think 500 is too many. How many of you agree?

These are the top 50 cmavo from

le	11208
.i	7438
mi	3324
cu	3253
nu	3034
do	2470
la	2319
se	2057
lo	2034
lu	1944
li'u	1933
coi	1669
na	1398
be	1199
gi'e	1161
sei	1154
ca	1098
ro	967
ma	751
go'i	749
noi	725
ku'i	644
nai	640
fi	633
lei	631
kei	624
da	614
.a	613
du'u	568
xu	567
pu	561
ko	542
bu	528
.e	525
ka	522
ba	516
je	506
loi	487
zo	463
doi	449
poi	447
je'e	380
te	374
di'u	367
no	365
pa	361
bo	345
pe	340
vi	337
co'a	336

But we probably need to do some fiddling. For example, "no" and "pa"
are the only numbers that made it to the top 50, but I think all
numbers should be tested in the first level. The only FA that made it
is "fi". It's reasonable that "fi" is the most frequent, but
fa/fe/fi/fo/fu are learned together and should be tested together, so
if "fi" is included they should all be (they might be left for the
second level). Similarly for se/te/ve/xe. Some of them I think we can
safely exclude, like "sei", which is there because of the frequent
"sei X cusku" especially in the Alice translation. Also lu-li'u maybe
need not be included. (But I would include "zo", especially if we
include "cmene". We can't use "cmene" without "zo".)

Mark's proposed list also has about 50 cmavo by my count, and it has
much overlap with the above list, as expected, but also some
lo, la, cu, mi, do, ti, ta, tu, and some other KOhA, nu, ka, ni, all
of SE, ca, pu, ba, NOI, GOI, .i, A,...
ku, kei and when they're needed, and cu as mentioned above.
A small selection of UI/CAI and COI (and DOI)
Numerals no-so and base-10 construction, perhaps also ro.

I think some 50 cmavo is about right for the first level. Then there
should be some cmevla, not too many but in any case cmevla are easy as
they don't need to be memorized, just recognized, and they are one of
the first things people learn anyway, so I don't think we need to
worry about how many of them we include. And then some reasonable
number of gismu that allow us to write meaningful sentences.

These are the top 50 gismu from

cusku	1295
mutce	388
klama	305
zvati	287
cmalu	277
tavla	250
viska	241
drata	236
djuno	219
pensi	219
catlu	217
nelci	202
barda	200
djica	197
gunka	193
cliva	190
pilno	171
cmene	168
jimpe	166
prenu	164
troci	151
xamgu	146
kumfa	143
citka	136
valsi	136
tirna	129
sutra	127
zdani	126
facki	125
ciska	124
stedu	124
pluta	123
nenri	122
cizra	120
ractu	119
simlu	118
xruti	118
drani	116
jitfa	111
voksa	111
dukse	109
krixa	109
tsali	109
jundi	108

Again we will probably need to do adjustements, but we won't know
which ones until we start producing the questions. We could start with
that list and then add/substract words as needed.

I would not include fu'ivla in the first level. A few lujvo perhaps
yes, but unfortunately I can't open the lujvo frequency list to get
some idea what the most frequent are. Probably things with sel-, nun-,
-gau, and such.

mu'o mi'e xorxes