Received: from localhost ([::1]:59528 helo=stodi.digitalkingdom.org) by stodi.digitalkingdom.org with esmtp (Exim 4.76) (envelope-from ) id 1UTrkn-0003OS-1e; Sun, 21 Apr 2013 03:44:17 -0700 Received: from 173-13-139-235-sfba.hfc.comcastbusiness.net ([173.13.139.235]:35688 helo=digitalkingdom.org) by stodi.digitalkingdom.org with smtp (Exim 4.76) (envelope-from ) id 1UTrkb-0003ON-J7 for wikichanges@lojban.org; Sun, 21 Apr 2013 03:44:15 -0700 Received: by digitalkingdom.org (sSMTP sendmail emulation); Sun, 21 Apr 2013 03:44:04 -0700 From: "Apache" Date: Sun, 21 Apr 2013 03:44:04 -0700 To: wikichanges@lojban.org X-PHP-Originating-Script: 48:htmlMimeMail.php MIME-Version: 1.0 Message-ID: X-Spam-Score: 0.4 (/) X-Spam_score: 0.4 X-Spam_score_int: 4 X-Spam_bar: / X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: The page Word frequency lists was changed by gleki at 10:43 UTC You can view the page by following this link: http://www.lojban.org/tiki/Word%20frequency%20lists [...] Content analysis details: (0.4 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: lojban.org] 0.4 RDNS_DYNAMIC Delivered to internal network by host with dynamic-looking rDNS Subject: [Wikichanges] Wiki page Word frequency lists changed by gleki X-BeenThere: wikichanges@lojban.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: webmaster@lojban.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: wikichanges-bounces@lojban.org Content-Length: 6034 The page Word frequency lists was changed by gleki at 10:43 UTC You can view the page by following this link: http://www.lojban.org/tiki/Word%20frequency%20lists You can view a diff back to the previous version by following this link: http://www.lojban.org/tiki/tiki-pagehistory.php?page=Word%20frequency%20lists&compare=1&oldver=8&newver=9 *********************************************************** The changes in this version follow below, followed after by the current full page text. *********************************************************** +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @@ -Lines: 1-6 changed to +Lines: 1-14 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ! Main lists (all words including cmavo clusters) + *[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list] + *((Word Frequency Lists: gismu)) + !How to generate lists yourself + * See [https://groups.google.com/d/topic/lojban/KTPslnix3mQ/discussion|discussion] for details + * [http://www.lojban.org/corpus/corpus.txt.bz2|The Lojbanic corpus in a .tar.gz archive]. + ! Older stuff * Older word frequencies can be found [http://www.lojban.org/files/roadmap.html#draft-dictionary_working|here] - * ((Word Frequency Lists: gismu))
* ((Robin Lee Powell))'s [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts. There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists
+ * {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common.
!! ((Robin Lee Powell))'s lists
[http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts. There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists
!! Rob Speer's lists The following is about Rob Speer's frequency lists, which have fallen off the 'net. Some of them have been recovered and attached +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @@ -Lines: 10-15 changed to +Lines: 18-21 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ These frequency lists are drawn from a corpus containing the contents of the lojban.org/texts directory, most of this Wiki's ((texts in Lojban)), as many ((IRC)) logs as I could find, the texts on ((CVS)), and a large portion of the ((jbosnu)) archives. I spent some time weeding out most of the English text, and tried to avoid picking up metalinguistic discussion (a word frequency list based on the main mailing list showed that ((lujvo)) is one of the most commonly used words). - - [http://takeneggs.com/lojban/corpus.tar.gz|The corpus, in a .tar.gz archive.] * {file name=freq_gismu.txt showdesc=1} +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @@ -Lines: 21-24 changed to +Lines: 27-28 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ mi'e ((rab.spir)) - - * {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common. *********************************************************** The new page content follows below. *********************************************************** ! Main lists (all words including cmavo clusters) *[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list] *((Word Frequency Lists: gismu)) !How to generate lists yourself * See [https://groups.google.com/d/topic/lojban/KTPslnix3mQ/discussion|discussion] for details * [http://www.lojban.org/corpus/corpus.txt.bz2|The Lojbanic corpus in a .tar.gz archive]. ! Older stuff * Older word frequencies can be found [http://www.lojban.org/files/roadmap.html#draft-dictionary_working|here] * {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common. !! ((Robin Lee Powell))'s lists [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts. There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists !! Rob Speer's lists The following is about Rob Speer's frequency lists, which have fallen off the 'net. Some of them have been recovered and attached here. The word frequency lists as of 2003/4/30. Stored on a separate server. These frequency lists are drawn from a corpus containing the contents of the lojban.org/texts directory, most of this Wiki's ((texts in Lojban)), as many ((IRC)) logs as I could find, the texts on ((CVS)), and a large portion of the ((jbosnu)) archives. I spent some time weeding out most of the English text, and tried to avoid picking up metalinguistic discussion (a word frequency list based on the main mailing list showed that ((lujvo)) is one of the most commonly used words). * {file name=freq_gismu.txt showdesc=1} * {file name=freq_cmavo21.txt showdesc=1} * BROKEN LINK: [http://takeneggs.com/lojban/compounds.txt|cmavo compounds] * BROKEN LINK: [http://takeneggs.com/lojban/lujvo_freq.txt|lujvo] (updated 2003/7/12; non-lujvo removed; malformed almost-lujvo marked with *) * BROKEN LINK: [http://takeneggs.com/lojban/fuhivla_freq.txt|fu'ivla] * BROKEN LINK: [http://takeneggs.com/lojban/cmene_freq.txt|cmene] mi'e ((rab.spir)) _______________________________________________ Wikichanges mailing list Wikichanges@lojban.org http://mail.lojban.org/mailman/listinfo/wikichanges