Received: from localhost ([::1]:59579 helo=stodi.digitalkingdom.org) by stodi.digitalkingdom.org with esmtp (Exim 4.76) (envelope-from ) id 1UTrqo-0003Sy-Na; Sun, 21 Apr 2013 03:50:31 -0700 Received: from 173-13-139-235-sfba.hfc.comcastbusiness.net ([173.13.139.235]:36013 helo=digitalkingdom.org) by stodi.digitalkingdom.org with smtp (Exim 4.76) (envelope-from ) id 1UTrqh-0003Ss-6i for wikineurotic@lojban.org; Sun, 21 Apr 2013 03:50:28 -0700 Received: by digitalkingdom.org (sSMTP sendmail emulation); Sun, 21 Apr 2013 03:50:22 -0700 From: "Apache" Date: Sun, 21 Apr 2013 03:50:22 -0700 To: wikineurotic@lojban.org X-PHP-Originating-Script: 48:htmlMimeMail.php MIME-Version: 1.0 Message-ID: X-Spam-Score: 2.0 (++) X-Spam_score: 2.0 X-Spam_score_int: 20 X-Spam_bar: ++ X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: The page Word frequency lists was changed by gleki at 10:49 UTC You can view the page by following this link: http://www.lojban.org/tiki/Word%20frequency%20lists [...] Content analysis details: (2.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: teddyb.org] 1.6 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT [173.13.139.235 listed in bb.barracudacentral.org] 0.4 RDNS_DYNAMIC Delivered to internal network by host with dynamic-looking rDNS Subject: [Wikineurotic] Wiki page Word frequency lists changed by gleki X-BeenThere: wikineurotic@lojban.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: webmaster@lojban.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: wikineurotic-bounces@lojban.org Content-Length: 3618 The page Word frequency lists was changed by gleki at 10:49 UTC You can view the page by following this link: http://www.lojban.org/tiki/Word%20frequency%20lists You can view a diff back to the previous version by following this link: http://www.lojban.org/tiki/tiki-pagehistory.php?page=Word%20frequency%20lists&compare=1&oldver=9&newver=10 *********************************************************** The changes in this version follow below, followed after by the current full page text. *********************************************************** +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @@ -Lines: 1-4 changed to +Lines: 1-4 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - ! Main lists (all words including cmavo clusters)
*[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list] + ! Main lists
*[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list (all words including cmavo clusters)] *((Word Frequency Lists: gismu)) !How to generate lists yourself *********************************************************** The new page content follows below. *********************************************************** ! Main lists *[http://www.lojban.org/tiki/tiki-download_wiki_attachment.php?attId=934|Full list (all words including cmavo clusters)] *((Word Frequency Lists: gismu)) !How to generate lists yourself * See [https://groups.google.com/d/topic/lojban/KTPslnix3mQ/discussion|discussion] for details * [http://www.lojban.org/corpus/corpus.txt.bz2|The Lojbanic corpus in a .tar.gz archive]. ! Older stuff * Older word frequencies can be found [http://www.lojban.org/files/roadmap.html#draft-dictionary_working|here] * {file name=line-templates-by-frequency.txt showdesc=1} This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common. !! ((Robin Lee Powell))'s lists [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/big_list|gismu and cmavo frequency ordered word list], based on Lojban IRC, Alice, and a few other large texts. There is also a [http://teddyb.org/~rlpowell/hobbies/lojban/flashcards/|large selection of intermediary files], including pure frequency lists !! Rob Speer's lists The following is about Rob Speer's frequency lists, which have fallen off the 'net. Some of them have been recovered and attached here. The word frequency lists as of 2003/4/30. Stored on a separate server. These frequency lists are drawn from a corpus containing the contents of the lojban.org/texts directory, most of this Wiki's ((texts in Lojban)), as many ((IRC)) logs as I could find, the texts on ((CVS)), and a large portion of the ((jbosnu)) archives. I spent some time weeding out most of the English text, and tried to avoid picking up metalinguistic discussion (a word frequency list based on the main mailing list showed that ((lujvo)) is one of the most commonly used words). * {file name=freq_gismu.txt showdesc=1} * {file name=freq_cmavo21.txt showdesc=1} * BROKEN LINK: [http://takeneggs.com/lojban/compounds.txt|cmavo compounds] * BROKEN LINK: [http://takeneggs.com/lojban/lujvo_freq.txt|lujvo] (updated 2003/7/12; non-lujvo removed; malformed almost-lujvo marked with *) * BROKEN LINK: [http://takeneggs.com/lojban/fuhivla_freq.txt|fu'ivla] * BROKEN LINK: [http://takeneggs.com/lojban/cmene_freq.txt|cmene] mi'e ((rab.spir)) _______________________________________________ Wikineurotic mailing list Wikineurotic@lojban.org http://mail.lojban.org/mailman/listinfo/wikineurotic