From jay.kominek@colorado.edu Fri Mar 15 14:41:32 2002 Return-Path: X-Sender: kominek@ucsub.colorado.edu X-Apparently-To: lojban@yahoogroups.com Received: (EGP: unknown); 15 Mar 2002 22:41:32 -0000 Received: (qmail 44106 invoked from network); 15 Mar 2002 22:41:32 -0000 Received: from unknown (216.115.97.172) by m9.grp.snv.yahoo.com with QMQP; 15 Mar 2002 22:41:32 -0000 Received: from unknown (HELO ucsub.colorado.edu) (128.138.129.12) by mta2.grp.snv.yahoo.com with SMTP; 15 Mar 2002 22:41:32 -0000 Received: from ucsub.colorado.edu (kominek@ucsub.colorado.edu [128.138.129.12]) by ucsub.colorado.edu (8.11.6/8.11.2/ITS-5.0/student) with ESMTP id g2FMfUI17874 for ; Fri, 15 Mar 2002 15:41:31 -0700 (MST) Date: Fri, 15 Mar 2002 15:41:30 -0700 (MST) To: lojban@yahoogroups.com Subject: Re: [lojban] lojban.org transfer, reprise. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE From: Jay Kominek X-Yahoo-Group-Post: member; u=20706630 X-Yahoo-Profile: jfkominek X-Yahoo-Message-Num: 13799 On Fri, 15 Mar 2002, Jim Carter wrote: > On Thu, 14 Mar 2002, Jay Kominek wrote: > > Just a heads up, mail archiving software sucks. > > Glimpse produces indices that are about 10% the size of the indexed corpu= s. Hmm. I had some other complaint with Glimpse. Wish I could remember it now. > I think htdig has similar characteristics, but I only tried it out once a= s > a demo. It's easy to set up, but it's oriented to a corpus that's alread= y > directly web accessible, vs. files to be spit out by a CGI script. I use htDig for the old list archives. jkominek@balance ~/public_html/lojban $ du -sh db* 42M db.docdb 1.3M db.docs.index 64M db.wordlist 58M db.words.db =3D 165.3MB Whereas the data being indexed is 62.3MB. (Sorry, misremembered.) It'd be sort of nice to have access to something like Thunderstone Texis for this, even though I hear it is a complete pain. - Jay Kominek Plus =C3=A7a change, plus c'est la m=C3=AAme chose