[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
word frequency list coming
- To: Multiple recipients of list LOJBAN <LOJBAN@CUVMB.BITNET>
- Subject: word frequency list coming
- From: Logical Language Group <lojbab@ACCESS.DIGEX.NET>
- Date: Thu, 24 Sep 1998 04:00:57 -0400
- Reply-to: Logical Language Group <lojbab@ACCESS.DIGEX.NET>
- Sender: Lojban list <LOJBAN@CUVMB.BITNET>
Well I took a look around the Web and found a word-frequency/concordance
program. I have run it on the entirety of my Lojban text archive (which
unbfortunately has some reptitions in it because of commentaries, quoted
text and revisions) and am working on filtering out all the garbage.
This is all Lojban text that I have up till 10/94, because my mail processing
is that far backlogged, that I haven't extracted the Lojban text from my
logs since then (takes me around 1-2 hours per month, so don't hold your
breath %^)
When I get done weeding it, I will make it available on the FTP site.
Probably within a week.
An advantage over previous efforts of this sort, is that if the concordance
function works, then we can check some of the questionable entries to
see the context, since the program maintains a keyword in context database
(I think). Previously, good English words like "simple" are detected as
Lojban words (since they are valid ones), and without the abvility to check
context, we have had no way to find out if the word was actually used in
LOJBAN.
Of course most of the usage of lujvo that Jorge is interested in dates from
after 10/94 (but then he has done most of such writing anyway %^).
lojbab