From jimc@MATH.UCLA.EDU Fri Mar 15 14:14:33 2002 Return-Path: X-Sender: jimc@math.ucla.edu X-Apparently-To: lojban@yahoogroups.com Received: (EGP: unknown); 15 Mar 2002 22:14:33 -0000 Received: (qmail 29479 invoked from network); 15 Mar 2002 22:14:32 -0000 Received: from unknown (216.115.97.167) by m8.grp.snv.yahoo.com with QMQP; 15 Mar 2002 22:14:32 -0000 Received: from unknown (HELO bodhi.math.ucla.edu) (128.97.4.253) by mta1.grp.snv.yahoo.com with SMTP; 15 Mar 2002 22:14:32 -0000 Received: from localhost (bodhi.math.ucla.edu [128.97.4.253]) by bodhi.math.ucla.edu (8.8.8/8.8.8) with ESMTP id OAA00649 for ; Fri, 15 Mar 2002 14:14:31 -0800 (PST) Date: Fri, 15 Mar 2002 14:14:43 -0800 (PST) Sender: To: Subject: Re: [lojban] lojban.org transfer, reprise. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Jim Carter X-Yahoo-Group-Post: member; u=810565 On Thu, 14 Mar 2002, Jay Kominek wrote: > On Thu, 14 Mar 2002, Robin Lee Powell wrote: > > > and we have a huge bank of archives that are nicely searchable. > > > > Trivially reproducable. > > Just a heads up, mail archiving software sucks. > > Searching systems suck, too. The indexing and stuff for just the static > content of the old list archive I've got, is hundreds of megabytes. (There > are maybe 10 megabytes of emails or so?) Glimpse produces indices that are about 10% the size of the indexed corpus. Likely there's a tradeoff between disc space vs. CPU time expended to do each search. Glimpse, at least, has got to be delivering a lot of pointers to documents that don't really contain the keywords, and which the back end has to weed out, which might have been avoided if more disc space had been used for a more intricate index. I think htdig has similar characteristics, but I only tried it out once as a demo. It's easy to set up, but it's oriented to a corpus that's already directly web accessible, vs. files to be spit out by a CGI script. James F. Carter Voice 310 825 2897 FAX 310 206 6673 UCLA-Mathnet; 6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555 Email: jimc@math.ucla.edu http://www.math.ucla.edu/~jimc (q.v. for PGP key)