From lojban+bncCK30vq5WEPSt9PIEGgS2ddIb@googlegroups.com Tue Aug 30 10:18:22 2011 Received: from mail-yw0-f61.google.com ([209.85.213.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1QyRx4-0003Tb-Sw; Tue, 30 Aug 2011 10:18:21 -0700 Received: by ywa6 with SMTP id 6sf11850455ywa.16 for ; Tue, 30 Aug 2011 10:18:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id:references :mime-version:in-reply-to:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type:content-disposition :content-transfer-encoding; bh=P2aATRy0dGcSDepy9G8XqzNPBGX6d3KnJgG3V5WLEy0=; b=04TcNWFWTlU1KOZbmzbbvuH8XZZ0xAbtrOr1h/ztWz4YY/d7hf1AK3m8QnvQ2sGO3M XL9uhFveOKOYegX5dWqjYfSy68XcE/U3DOSZGNk/Hq05AJw9yXUX8LoVkkmhhZT0ex4d lILi0bVKC0WusARsrhPTqD2KB4U8HksikkCdw= Received: by 10.151.8.14 with SMTP id l14mr1047421ybi.73.1314723572608; Tue, 30 Aug 2011 09:59:32 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.231.126.37 with SMTP id a37ls330556ibs.0.gmail; Tue, 30 Aug 2011 09:59:31 -0700 (PDT) Received: by 10.42.136.7 with SMTP id r7mr1025281ict.15.1314723571744; Tue, 30 Aug 2011 09:59:31 -0700 (PDT) Received: by 10.42.136.7 with SMTP id r7mr1025279ict.15.1314723571737; Tue, 30 Aug 2011 09:59:31 -0700 (PDT) Received: from chain.digitalkingdom.org (digitalkingdom.org. [173.13.139.234]) by gmr-mx.google.com with ESMTPS id kr11si17485500pbb.1.2011.08.30.09.59.31 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 30 Aug 2011 09:59:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of rlpowell@digitalkingdom.org designates 173.13.139.234 as permitted sender) client-ip=173.13.139.234; Received: from rlpowell by chain.digitalkingdom.org with local (Exim 4.72) (envelope-from ) id 1QyRer-0002fc-G8 for lojban@googlegroups.com; Tue, 30 Aug 2011 09:59:30 -0700 Date: Tue, 30 Aug 2011 09:59:29 -0700 From: Robin Lee Powell To: lojban@googlegroups.com Subject: Behold the corups app (was Re: [lojban] HISTORIAN: What's up with this file?) Message-ID: <20110830165926.GQ23836@digitalkingdom.org> References: <20110830071859.GN23836@digitalkingdom.org> <4E5CE38F.2020307@lojban.org> MIME-Version: 1.0 In-Reply-To: <4E5CE38F.2020307@lojban.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Original-Sender: rlpowell@digitalkingdom.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of rlpowell@digitalkingdom.org designates 173.13.139.234 as permitted sender) smtp.mail=rlpowell@digitalkingdom.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 30, 2011 at 09:20:15AM -0400, Robert LeChevalier wrote: > Jorge Llamb=EDas wrote: > >On Tue, Aug 30, 2011 at 4:18 AM, Robin Lee Powell > > wrote: > > > >>Attached you'll find two gismu lists that appear to differ only > >>in the weird columns in the middle; for example, in one (the > >>normal gismu list that we've been providing all these years) > >>bacru has "1h 386", attach as gu2. In the other, which I found > >>in a weird spot on the web server as gismu_updated.txt, attached > >>as gu1, has bacru with "1h 405". > >> > >>Any idea what's going on there? gismu_updated.txt seems to have > >>been last touched in 2002, and might easily be something I > >>hacked together or something? I dunno. > > > > > >I seem to remember those numbers were frequencies, so they > >probably correspond to an updated corpus. >=20 > That is correct. I don't recall what the reason for doing it was, > but the gu1 list seems to be a later set of counts. >=20 > It would seem likely that the usage since 2002 probably dwarfs > what went before, so someone might want to generate new counts > based on the corpora online, but I suggest in future that they be > a separate file from the official gismu list (in a different > format, so this doesn't come up again) http://www.lojban.org/cgi-bin/corpus/ That is one of best things that anyone has done around here at my request. Take a bow, purpleposeidon. :) From there, scripting a frequency counter (as I have) is pretty trivial. -Robin --=20 http://singinst.org/ : Our last, best hope for a fantastic future. Lojban (http://www.lojban.org/): The language in which "this parrot is dead" is "ti poi spitaki cu morsi", but "this sentence is false" is "na nei". My personal page: http://www.digitalkingdom.org/rlp/ --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.