From lojban-out@lojban.org Mon Aug 12 14:01:50 2002
Return-Path: <lojban-out@lojban.org>
X-Sender: lojban-out@lojban.org
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-8_0_7_4); 12 Aug 2002 21:01:50 -0000
Received: (qmail 20409 invoked from network); 12 Aug 2002 21:01:49 -0000
Received: from unknown (66.218.66.218)
  by m10.grp.scd.yahoo.com with QMQP; 12 Aug 2002 21:01:49 -0000
Received: from unknown (HELO chain.digitalkingdom.org) (204.152.186.175)
  by mta3.grp.scd.yahoo.com with SMTP; 12 Aug 2002 21:01:49 -0000
Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.05)
  id 17eMJx-0003jW-00
  for lojban@yahoogroups.com; Mon, 12 Aug 2002 14:01:49 -0700
Received: from digitalkingdom.org ([204.152.186.175] helo=chain)
  by chain.digitalkingdom.org with esmtp (Exim 4.05)
  id 17eMJN-0003j9-00; Mon, 12 Aug 2002 14:01:13 -0700
Received: with ECARTIS (v1.0.0; list lojban-list); Mon, 12 Aug 2002 14:01:10 -0700 (PDT)
Received: from rlpowell by chain.digitalkingdom.org with local (Exim 4.05)
  id 17eMJI-0003j0-00
  for lojban-list@lojban.org; Mon, 12 Aug 2002 14:01:08 -0700
Date: Mon, 12 Aug 2002 14:01:08 -0700
To: lojban-list@lojban.org
Subject: Re: [lojban] Site Map generating tools?
Message-ID: <20020812210108.GY13664@chain.digitalkingdom.org>
Mail-Followup-To: lojban-list@lojban.org
References: <20020809232942.GE21530@chain.digitalkingdom.org> <200208100218.WAA06084@mail2.reutershealth.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="rwEMma7ioTxnRzrJ"
Content-Disposition: inline
In-Reply-To: <200208100218.WAA06084@mail2.reutershealth.com>
User-Agent: Mutt/1.4i
X-archive-position: 571
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: rlpowell@digitalkingdom.org
Precedence: bulk
X-list: lojban-list
X-eGroups-From: Robin Lee Powell <rlpowell@digitalkingdom.org>
From: Robin Lee Powell <lojban-out@lojban.org>
Reply-To: rlpowell@digitalkingdom.org
X-Yahoo-Group-Post: member; u=116389790
X-Yahoo-Profile: lojban_out

--rwEMma7ioTxnRzrJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, Aug 09, 2002 at 10:05:15PM -0400, John Cowan wrote:
> Robin Lee Powell scripsit:
> 
> > I would really like a tool to automatically generate a site map for
> > lojban.org (or whatever). Basically, I want something to spider a few
> > levels of pages and make links to them based on their title tags. Or
> > something.
> 
> In an empty directory, do this:

That wasn't quite doing what I wanted. The script I eventually came up
with is attached; it does the whole site map as is currently on the
site, start to finish.

-Robin

-- 
http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest.
le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno
je xlali -- RLP http://www.lojban.org/

--rwEMma7ioTxnRzrJ
Content-Type: application/x-sh
Content-Disposition: attachment; filename="sitemap.sh"
Content-Transfer-Encoding: quoted-printable

#!/bin/sh=0A# =0A# lojban-web - The official lojban website.=0A#	Version: =
.C050=0A# Copyright (C) 2002 Robin Lee Powell. All rights reserved.=0A# =
Written by rlpowell@digitalkingdom.org=0A# =0A# Redistribution and use in =
source and binary forms, with or without=0A# modification, are permitted p=
rovided that the following conditions are=0A# met:=0A# =0A# 1. Redistri=
butions of source code must retain the above copyright=0A# notice, this l=
ist of conditions and the following disclaimer.=0A# =0A# 2. Redistributi=
ons in binary form must reproduce the above copyright=0A# notice, this li=
st of conditions and the following disclaimer in the=0A# documentation an=
d/or other materials provided with the distribution.=0A# =0A# THIS SOFTWA=
RE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR=0A# IMPLIED WARR=
ANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED=0A# WARRANTIES OF MERCH=
ANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE=0A# DISCLAIMED. IN NO =
EVENT SHALL THE AUTHER OR CONTRIBUTORS BE LIABLE FOR=0A# ANY DIRECT, INDIR=
ECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL=0A# DAMAGES (INCLUDI=
NG, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=0A# OR SERVICES; L=
OSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)=0A# HOWEVER CAUSED=
AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,=0A# STRICT LIABILITY=
, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN=0A# ANY WAY OUT O=
F THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE=0A# POSSIBILITY OF SUCH=
DAMAGE.=0A#=0A# Shell script to generate a site map=0A#=0A=0Astartdir=3D`p=
wd`=0A=0Amkdir /tmp/sitemap.$$=0A=0Acd /tmp/sitemap.$$=0A=0Aecho "Wgetting =
files."=0Awget -q -l 7 -r --exclude-directories=3D"/~rlpowell,/files,/cgi-b=
in,/stats" http://www.lojban.org # does 3 levels including root=0A=
=0Aecho "Removing CR/LF."=0A=0Afor file in `find . -name '*.html'`=0Ado=0A =
cat $file | tr -s '[\r\n]' ' ' >$file.$$=0A mv $file.$$ $file=0Adone=
=0A=0Aecho "Grabbing titles."=0A=0Afind . -name '*.html' -exec sgrep -o '%f=
:%r\n' \=0A'"<title>".."</title>"' {} \; > sitemap.1=0A=0Aecho "Grabbing hr=
efs."=0A=0Afor file in `find . -name '*.html'`=0Ado=0A short=3D`echo $fi=
le | sed 's!./www.lojban.org/!!'`=0A #echo $short=0A find . -name '*.=
html' -exec sgrep -o '%f:%r\n' \=0A "\"<a href=3D\"..\"</a>\" containing=
\"$short\"" \=0A {} \; >> sitemap.1=0A=0Adone=0A=0Aecho "Sorting."=0Aca=
t sitemap.1 | sed 's/<title>/<1title>/' | sort | uniq | tee sitemap.2=0A=0A=
echo "=0A<!DOCTYPE html PUBLIC=0A "-//W3C//DTD XHTML 1.0 Strict//EN"=0A =
"DTD/xhtml1-strict.dtd" >=0A<html xmlns =3D "http://www.w3.org/1999/=
xhtml" lang=3D"en">=0A<head>=0A<title>=0A www.lojban.org Site Map=0A</ti=
tle>=0A=0A<!-- #LWEB#HEADER# -->=0A=0A<h1>www.lojban.org Site Map</h1>=0A=
=0A<ul><ul>=0A" >top.html=0Aecho "</ul>=0A<!-- #LWEB#FOOTER# -->=0A" >botto=
m.html=0A=0Aecho "Creating lists."=0Acat top.html sitemap.2 bottom.html | \=
=0Ased 's!./www.lojban.org/\(.*\):<1title>\(.*\)</title>!</ul><li><a href=
=3D"\1">\2</a></li><ul>!' |\=0Ased 's!./www.lojban.org/.*:\(<a href=3D.*</a=
>\)!<li>\1</li>!' |\=0Atee sitemap.3=0A=0Aecho "Weeding out extras."=0AIFS=
=3D"=0A"=0Afor line in `cat sitemap.3 | grep '^<li><a href=3D[^>]*>[^<]*</a=
></li>$'`=0Ado=0A file=3D`echo $line | sed 's/<li><a href=3D"\([^"]*\)".=
*/\1/'`=0A IFS=3D"=0A	"=0A grepnum=3D`cat sitemap.3 | egrep -n "^</u=
l><li><a href=3D.$file.>[^<]*</a></li><ul>" | sed 's/:.*//'`=0A if [ "$g=
repnum" ]=0A then=0A	grepnum=3D`expr $grepnum + 1`=0A	sedtest=3D`cat sit=
emap.3 | sed "${grepnum}s1</ul>.*1yes1=0A	${grepnum}!d"`=0A	IFS=3D"=0A"=0A	=
if [ "$sedtest" =3D "yes" ]=0A	then=0A	grepnum=3D`expr $grepnum - 1`=0A=
cat sitemap.3 | sed "${grepnum}d" >sitemap.3.$$=0A	mv sitemap.3.$$=
sitemap.3=0A	fi=0A fi=0Adone=0A=0Acp sitemap.3 sitemap.html=0A=0Acp sit=
emap.html $startdir/htdocs/sitemap.in.html=0A=0Aexit 0=0A
--rwEMma7ioTxnRzrJ--

