From nobody@digitalkingdom.org Sat Nov 01 14:43:01 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Sat, 01 Nov 2008 14:43:01 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1KwOFJ-00029N-2k for lojban-list-real@lojban.org; Sat, 01 Nov 2008 14:43:01 -0700 Received: from sabre-wulf.nvg.ntnu.no ([129.241.210.67]) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1KwOFD-00029D-VL for lojban-list@lojban.org; Sat, 01 Nov 2008 14:43:00 -0700 Received: from hagbart.nvg.ntnu.no (hagbart.nvg.ntnu.no [129.241.210.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sabre-wulf.nvg.ntnu.no (Postfix) with ESMTP id D48E9947C0 for ; Sat, 1 Nov 2008 22:42:37 +0100 (CET) Received: from hagbart.nvg.ntnu.no (localhost.localdomain [127.0.0.1]) by hagbart.nvg.ntnu.no (8.13.8/8.12.8) with ESMTP id mA1LgbWI001248 for ; Sat, 1 Nov 2008 22:42:37 +0100 Received: (from arj@localhost) by hagbart.nvg.ntnu.no (8.13.8/8.13.1/Submit) id mA1LgaH4001247 for lojban-list@lojban.org; Sat, 1 Nov 2008 22:42:36 +0100 Date: Sat, 1 Nov 2008 22:42:36 +0100 From: Arnt Richard Johansen To: lojban-list@lojban.org Subject: [lojban] Sources for luj1999? Message-ID: <20081101214236.GI2447@nvg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-NVG-MailScanner-Information: Please contact the ISP for more information X-NVG-MailScanner: Found to be clean X-MailScanner-From: arj@nvg.ntnu.no X-Spam-Score: -0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14893 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: arj@nvg.org Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list http://www.lojban.org/publications/draft-dictionary/Working/luj1999.ZIP This file contains lujvo that have been automatically excerpted from texts, semi-automatically converted into their canonical forms. It also contains frequency counts of this words. What I would like to know is which source texts have been used, and if they are available somewhere. To take a specific example, consider this line: (2) cevyspe god+married canonical form=ceispe This apparently means that the word "cevyspe" was used two times in the corpus. But a web search turns up nothing for "cevyspe", save an older word frequency list: http://www.lojban.org/publications/wordlists/frequencies2.txt What do I need to have to make sure that I have the context for every word that occurs in luj1999.zip? -- Arnt Richard Johansen http://arj.nvg.org/ Keyboard: The Ultimate Input Device To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.