[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wikichanges] Wiki page Word frequency lists changed by gleki



The page Word frequency lists was changed by gleki at 10:43 UTC

You can view the page by following this link:
 
http://www.lojban.org/tiki/Word%20frequency%20lists

You can view a diff back to the previous version by following this link: 
http://www.lojban.org/tiki/tiki-pagehistory.php?page=Word%20frequency%20lists&compare=1&oldver=8&newver=9


***********************************************************
The changes in this version follow below, followed after by the current full page text.
***********************************************************


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -Lines: 1-6 changed to +Lines: 1-14 @@
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ ! Main lists (all words including cmavo clusters)
+ *[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list]
+ *((Word Frequency Lists: gismu))
+ !How to generate lists yourself
+ * See [https://groups.google.com/d/topic/lojban/KTPslnix3mQ/discussion|discussion] for details
+ * [http://www.lojban.org/corpus/corpus.txt.bz2|The Lojbanic corpus in a .tar.gz archive].
+ ! Older stuff
* Older word frequencies can be found [http://www.lojban.org/files/roadmap.html#draft-dictionary_working|here]
- * ((Word Frequency Lists: gismu))<br />* ((Robin Lee Powell))'s [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts.  There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists<br />
+ * {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of &quot;sentence templates&quot; excerpted from IRC. It shows which sequences of selma'o/word types are most common.<br />!! ((Robin Lee Powell))'s lists<br />[http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts.  There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists<br />!! Rob Speer's lists
The following is about Rob Speer's frequency lists, which have
fallen off the 'net.  Some of them have been recovered and attached

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -Lines: 10-15 changed to +Lines: 18-21 @@
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

These frequency lists are drawn from a corpus containing the contents of the lojban.org/texts directory, most of this Wiki's ((texts in Lojban)), as many ((IRC)) logs as I could find, the texts on ((CVS)), and a large portion of the ((jbosnu)) archives. I spent some time weeding out most of the English text, and tried to avoid picking up metalinguistic discussion (a word frequency list based on the main mailing list showed that ((lujvo)) is one of the most commonly used words).
- 
- [http://takeneggs.com/lojban/corpus.tar.gz|The corpus, in a .tar.gz archive.]

* {file name=freq_gismu.txt showdesc=1}

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -Lines: 21-24 changed to +Lines: 27-28 @@
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

mi'e ((rab.spir))
- 
- * {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common.




***********************************************************
The new page content follows below.
***********************************************************

! Main lists (all words including cmavo clusters)

*[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list]

*((Word Frequency Lists: gismu))

!How to generate lists yourself

* See [https://groups.google.com/d/topic/lojban/KTPslnix3mQ/discussion|discussion] for details

* [http://www.lojban.org/corpus/corpus.txt.bz2|The Lojbanic corpus in a .tar.gz archive].

! Older stuff

* Older word frequencies can be found [http://www.lojban.org/files/roadmap.html#draft-dictionary_working|here]

* {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common.

!! ((Robin Lee Powell))'s lists

[http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts.  There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists

!! Rob Speer's lists

The following is about Rob Speer's frequency lists, which have

fallen off the 'net.  Some of them have been recovered and attached

here.



The word frequency lists as of 2003/4/30. Stored on a separate server.



These frequency lists are drawn from a corpus containing the contents of the lojban.org/texts directory, most of this Wiki's ((texts in Lojban)), as many ((IRC)) logs as I could find, the texts on ((CVS)), and a large portion of the ((jbosnu)) archives. I spent some time weeding out most of the English text, and tried to avoid picking up metalinguistic discussion (a word frequency list based on the main mailing list showed that ((lujvo)) is one of the most commonly used words).



* {file name=freq_gismu.txt showdesc=1}

* {file name=freq_cmavo21.txt showdesc=1}

* BROKEN LINK: [http://takeneggs.com/lojban/compounds.txt|cmavo compounds]

* BROKEN LINK: [http://takeneggs.com/lojban/lujvo_freq.txt|lujvo] (updated 2003/7/12; non-lujvo removed; malformed almost-lujvo marked with *)

* BROKEN LINK: [http://takeneggs.com/lojban/fuhivla_freq.txt|fu'ivla]

* BROKEN LINK: [http://takeneggs.com/lojban/cmene_freq.txt|cmene]



mi'e ((rab.spir))


_______________________________________________
Wikichanges mailing list
Wikichanges@lojban.org
http://mail.lojban.org/mailman/listinfo/wikichanges