[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Site Map generating tools?



Robin Lee Powell scripsit:

> I would really like a tool to automatically generate a site map for
> lojban.org (or whatever).  Basically, I want something to spider a few
> levels of pages and make links to them based on their title tags.  Or
> something.

In an empty directory, do this:

wget -l3 -q -r -nd http://www.lojban.org	# does 3 levels including root
mkdir scratch
for i in *.html
do	tr -s '[\r\n]' ' ' <$i >scratch/$i
done
mv scratch/* .
sgrep -o '%f:%r\n' '"<title>".."</title>"' *.html

The standard output of sgrep is a file like this:

about.html:<title>     About Lojban </title>
advanced.html:<title>Lojban For Advanced Students</title>
beginners.html:<title>Lojban For Beginners</title>
index.html:<title>      Official Lojban Home Page     </title>
learning.html:<title>     Learning Lojban </title>
llg.html:<title>     About The Logical Language Group </title>
reading.html:<title>     Reading Lojban </title>
resources.html:<title>     Lojban Resources </title>
whatis.html:<title>     What Is Lojban? </title>
where.html:<title>Where Is Lojban Spoken?</title>
why_learn.html:<title>     Why Learn Lojban? </title>

which you can then bash into shape with your favorite text processing
language (any of awk, sed, Perl, Python, ...).


-- 
John Cowan                              jcowan@reutershealth.com
http://www.ccil.org/~cowan              http://www.reutershealth.com
Thor Heyerdahl recounts his attempt to prove Rudyard Kipling's theory
that the mongoose first came to India on a raft from Polynesia.
	--blurb for _Rikki-Kon-Tiki-Tavi_