From jcowan@reutershealth.com Fri Aug 09 19:09:03 2002 Return-Path: X-Sender: lojban-out@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_0_7_4); 10 Aug 2002 02:09:02 -0000 Received: (qmail 21706 invoked from network); 10 Aug 2002 02:09:02 -0000 Received: from unknown (66.218.66.216) by m9.grp.scd.yahoo.com with QMQP; 10 Aug 2002 02:09:02 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (204.152.186.175) by mta1.grp.scd.yahoo.com with SMTP; 10 Aug 2002 02:09:03 -0000 Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.05) id 17dLgd-0001d9-00 for lojban@yahoogroups.com; Fri, 09 Aug 2002 19:09:03 -0700 Received: from digitalkingdom.org ([204.152.186.175] helo=chain) by chain.digitalkingdom.org with esmtp (Exim 4.05) id 17dLgS-0001cr-00; Fri, 09 Aug 2002 19:08:52 -0700 Received: with ECARTIS (v1.0.0; list lojban-list); Fri, 09 Aug 2002 19:08:49 -0700 (PDT) Received: from [65.246.141.151] (helo=mail2.reutershealth.com) by chain.digitalkingdom.org with esmtp (Exim 4.05) id 17dLgO-0001ch-00 for lojban-list@lojban.org; Fri, 09 Aug 2002 19:08:48 -0700 Received: from skunk.reutershealth.com (IDENT:cowan@[10.65.117.21]) by mail2.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id WAA06084; Fri, 9 Aug 2002 22:18:38 -0400 (EDT) Message-Id: <200208100218.WAA06084@mail2.reutershealth.com> Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Fri, 9 Aug 2002 22:05:15 +4400 Subject: Re: [lojban] Site Map generating tools? To: rlpowell@digitalkingdom.org Date: Fri, 9 Aug 2002 22:05:15 -0400 (EDT) Cc: lojban-list@lojban.org In-Reply-To: <20020809232942.GE21530@chain.digitalkingdom.org> from "Robin Lee Powell" at Aug 09, 2002 04:29:42 PM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 526 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: jcowan@reutershealth.com Precedence: bulk X-list: lojban-list From: John Cowan Reply-To: jcowan@reutershealth.com X-Yahoo-Group-Post: member; u=8122456 X-Yahoo-Profile: john_w_cowan Robin Lee Powell scripsit: > I would really like a tool to automatically generate a site map for > lojban.org (or whatever). Basically, I want something to spider a few > levels of pages and make links to them based on their title tags. Or > something. In an empty directory, do this: wget -l3 -q -r -nd http://www.lojban.org # does 3 levels including root mkdir scratch for i in *.html do tr -s '[\r\n]' ' ' <$i >scratch/$i done mv scratch/* . sgrep -o '%f:%r\n' '""..""' *.html The standard output of sgrep is a file like this: about.html: About Lojban advanced.html:Lojban For Advanced Students beginners.html:Lojban For Beginners index.html: Official Lojban Home Page learning.html: Learning Lojban llg.html: About The Logical Language Group reading.html: Reading Lojban resources.html: Lojban Resources whatis.html: What Is Lojban? where.html:Where Is Lojban Spoken? why_learn.html: Why Learn Lojban? which you can then bash into shape with your favorite text processing language (any of awk, sed, Perl, Python, ...). -- John Cowan jcowan@reutershealth.com http://www.ccil.org/~cowan http://www.reutershealth.com Thor Heyerdahl recounts his attempt to prove Rudyard Kipling's theory that the mongoose first came to India on a raft from Polynesia. --blurb for _Rikki-Kon-Tiki-Tavi_