[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Lojban Certification Program
On Fri, Sep 18, 2009 at 2:30 PM, Matt Arnold <matt.mattarn@gmail.com> wrote:
>
> I think the question is whether to use the most common 500 words, or
> weight it in favor of cmavo.
In favor or against cmavo? I think it is cmavo that are
overrepresented in the initial segment. In the first 100 words there
are only 18 gismu. It's pretty hard to construct sentences which use
82 cmavo but are constrained to only 18 gismu.
> I still think 500 is too many. How many of you agree?
These are the top 50 cmavo from
http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/cmavo_freq
le 11208
.i 7438
mi 3324
cu 3253
nu 3034
do 2470
la 2319
se 2057
lo 2034
lu 1944
li'u 1933
coi 1669
na 1398
be 1199
gi'e 1161
sei 1154
ca 1098
ro 967
ma 751
go'i 749
noi 725
ku'i 644
nai 640
fi 633
lei 631
kei 624
da 614
.a 613
du'u 568
xu 567
pu 561
ko 542
bu 528
.e 525
ka 522
ba 516
je 506
loi 487
zo 463
doi 449
poi 447
je'e 380
te 374
di'u 367
no 365
pa 361
bo 345
pe 340
vi 337
co'a 336
But we probably need to do some fiddling. For example, "no" and "pa"
are the only numbers that made it to the top 50, but I think all
numbers should be tested in the first level. The only FA that made it
is "fi". It's reasonable that "fi" is the most frequent, but
fa/fe/fi/fo/fu are learned together and should be tested together, so
if "fi" is included they should all be (they might be left for the
second level). Similarly for se/te/ve/xe. Some of them I think we can
safely exclude, like "sei", which is there because of the frequent
"sei X cusku" especially in the Alice translation. Also lu-li'u maybe
need not be included. (But I would include "zo", especially if we
include "cmene". We can't use "cmene" without "zo".)
Mark's proposed list also has about 50 cmavo by my count, and it has
much overlap with the above list, as expected, but also some
differences:
<<
lo, la, cu, mi, do, ti, ta, tu, and some other KOhA, nu, ka, ni, all
of SE, ca, pu, ba, NOI, GOI, .i, A,...
ku, kei and when they're needed, and cu as mentioned above.
A small selection of UI/CAI and COI (and DOI)
Numerals no-so and base-10 construction, perhaps also ro.
>>
I think some 50 cmavo is about right for the first level. Then there
should be some cmevla, not too many but in any case cmevla are easy as
they don't need to be memorized, just recognized, and they are one of
the first things people learn anyway, so I don't think we need to
worry about how many of them we include. And then some reasonable
number of gismu that allow us to write meaningful sentences.
These are the top 50 gismu from
http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/gismu_freq
cusku 1295
mutce 388
klama 305
zvati 287
cmalu 277
tavla 250
viska 241
drata 236
djuno 219
pensi 219
catlu 217
nelci 202
barda 200
djica 197
gunka 193
cliva 190
pilno 171
cmene 168
jimpe 166
prenu 164
troci 151
xamgu 146
kumfa 143
citka 136
valsi 136
tirna 129
sutra 127
zdani 126
facki 125
ciska 124
stedu 124
pluta 123
nenri 122
cizra 120
ractu 119
simlu 118
xruti 118
drani 116
jitfa 111
voksa 111
dukse 109
krixa 109
tsali 109
jundi 108
Again we will probably need to do adjustements, but we won't know
which ones until we start producing the questions. We could start with
that list and then add/substract words as needed.
I would not include fu'ivla in the first level. A few lujvo perhaps
yes, but unfortunately I can't open the lujvo frequency list to get
some idea what the most frequent are. Probably things with sel-, nun-,
-gau, and such.
mu'o mi'e xorxes