From rlpowell@digitalkingdom.org Mon Aug 12 14:01:10 2002 Received: with ECARTIS (v1.0.0; list lojban-list); Mon, 12 Aug 2002 14:01:10 -0700 (PDT) Received: from rlpowell by chain.digitalkingdom.org with local (Exim 4.05) id 17eMJI-0003j0-00 for lojban-list@lojban.org; Mon, 12 Aug 2002 14:01:08 -0700 Date: Mon, 12 Aug 2002 14:01:08 -0700 From: Robin Lee Powell To: lojban-list@lojban.org Subject: Re: [lojban] Site Map generating tools? Message-ID: <20020812210108.GY13664@chain.digitalkingdom.org> Mail-Followup-To: lojban-list@lojban.org References: <20020809232942.GE21530@chain.digitalkingdom.org> <200208100218.WAA06084@mail2.reutershealth.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="rwEMma7ioTxnRzrJ" Content-Disposition: inline In-Reply-To: <200208100218.WAA06084@mail2.reutershealth.com> User-Agent: Mutt/1.4i X-archive-position: 571 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: rlpowell@digitalkingdom.org Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list --rwEMma7ioTxnRzrJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Aug 09, 2002 at 10:05:15PM -0400, John Cowan wrote: > Robin Lee Powell scripsit: > > > I would really like a tool to automatically generate a site map for > > lojban.org (or whatever). Basically, I want something to spider a few > > levels of pages and make links to them based on their title tags. Or > > something. > > In an empty directory, do this: That wasn't quite doing what I wanted. The script I eventually came up with is attached; it does the whole site map as is currently on the site, start to finish. -Robin -- http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest. le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno je xlali -- RLP http://www.lojban.org/ --rwEMma7ioTxnRzrJ Content-Type: application/x-sh Content-Disposition: attachment; filename="sitemap.sh" Content-Transfer-Encoding: quoted-printable #!/bin/sh=0A# =0A# lojban-web - The official lojban website.=0A# Version: = .C050=0A# Copyright (C) 2002 Robin Lee Powell. All rights reserved.=0A# = Written by rlpowell@digitalkingdom.org=0A# =0A# Redistribution and use in = source and binary forms, with or without=0A# modification, are permitted p= rovided that the following conditions are=0A# met:=0A# =0A# 1. Redistri= butions of source code must retain the above copyright=0A# notice, this l= ist of conditions and the following disclaimer.=0A# =0A# 2. Redistributi= ons in binary form must reproduce the above copyright=0A# notice, this li= st of conditions and the following disclaimer in the=0A# documentation an= d/or other materials provided with the distribution.=0A# =0A# THIS SOFTWA= RE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR=0A# IMPLIED WARR= ANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED=0A# WARRANTIES OF MERCH= ANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE=0A# DISCLAIMED. IN NO = EVENT SHALL THE AUTHER OR CONTRIBUTORS BE LIABLE FOR=0A# ANY DIRECT, INDIR= ECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL=0A# DAMAGES (INCLUDI= NG, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=0A# OR SERVICES; L= OSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)=0A# HOWEVER CAUSED= AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,=0A# STRICT LIABILITY= , OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN=0A# ANY WAY OUT O= F THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE=0A# POSSIBILITY OF SUCH= DAMAGE.=0A#=0A# Shell script to generate a site map=0A#=0A=0Astartdir=3D`p= wd`=0A=0Amkdir /tmp/sitemap.$$=0A=0Acd /tmp/sitemap.$$=0A=0Aecho "Wgetting = files."=0Awget -q -l 7 -r --exclude-directories=3D"/~rlpowell,/files,/cgi-b= in,/stats" http://www.lojban.org # does 3 levels including root=0A= =0Aecho "Removing CR/LF."=0A=0Afor file in `find . -name '*.html'`=0Ado=0A = cat $file | tr -s '[\r\n]' ' ' >$file.$$=0A mv $file.$$ $file=0Adone= =0A=0Aecho "Grabbing titles."=0A=0Afind . -name '*.html' -exec sgrep -o '%f= :%r\n' \=0A'""..""' {} \; > sitemap.1=0A=0Aecho "Grabbing hr= efs."=0A=0Afor file in `find . -name '*.html'`=0Ado=0A short=3D`echo $fi= le | sed 's!./www.lojban.org/!!'`=0A #echo $short=0A find . -name '*.= html' -exec sgrep -o '%f:%r\n' \=0A "\"\" containing= \"$short\"" \=0A {} \; >> sitemap.1=0A=0Adone=0A=0Aecho "Sorting."=0Aca= t sitemap.1 | sed 's//<1title>/' | sort | uniq | tee sitemap.2=0A=0A= echo "=0A<!DOCTYPE html PUBLIC=0A "-//W3C//DTD XHTML 1.0 Strict//EN"=0A = "DTD/xhtml1-strict.dtd" >=0A<html xmlns =3D "http://www.w3.org/1999/= xhtml" lang=3D"en">=0A<head>=0A<title>=0A www.lojban.org Site Map=0A</ti= tle>=0A=0A<!-- #LWEB#HEADER# -->=0A=0A<h1>www.lojban.org Site Map</h1>=0A= =0A<ul><ul>=0A" >top.html=0Aecho "</ul>=0A<!-- #LWEB#FOOTER# -->=0A" >botto= m.html=0A=0Aecho "Creating lists."=0Acat top.html sitemap.2 bottom.html | \= =0Ased 's!./www.lojban.org/\(.*\):<1title>\(.*\)!
  • \2