[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] lojban.org transfer, reprise.
On Thu, 14 Mar 2002, Jay Kominek wrote:
> On Thu, 14 Mar 2002, Robin Lee Powell wrote:
> > > and we have a huge bank of archives that are nicely searchable.
> >
> > Trivially reproducable.
>
> Just a heads up, mail archiving software sucks.
>
> Searching systems suck, too. The indexing and stuff for just the static
> content of the old list archive I've got, is hundreds of megabytes. (There
> are maybe 10 megabytes of emails or so?)
Glimpse produces indices that are about 10% the size of the indexed corpus.
Likely there's a tradeoff between disc space vs. CPU time expended to do
each search. Glimpse, at least, has got to be delivering a lot of pointers
to documents that don't really contain the keywords, and which the back end
has to weed out, which might have been avoided if more disc space had been
used for a more intricate index.
I think htdig has similar characteristics, but I only tried it out once as
a demo. It's easy to set up, but it's oriented to a corpus that's already
directly web accessible, vs. files to be spit out by a CGI script.
James F. Carter Voice 310 825 2897 FAX 310 206 6673
UCLA-Mathnet; 6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555
Email: jimc@math.ucla.edu http://www.math.ucla.edu/~jimc (q.v. for PGP key)