Dylan Parry wrote:
> Hi folks,
> I was wondering if anyone here knows of any good software that can be
> used to generate a Google Sitemap? I currently use Endsheet
> (http://www.tipue.com/endsheet/), but it seems to have one or two
> problems such as indexing pages that don't actually exist! It's also not
> primarily for creating Google Sitemaps, so I think this function was
> merely bolted on as an afterthought :\
> Preferably, the software will be open source and/or free, or at the
> least there will be a trial version available that I can test drive for
> a couple of weeks before buying.
> The ideal application would be one that actually spiders my sites rather
> than scans a log file or stats page, as not all of my pages will have
> been visited before they need to be in the sitemap. I also want to be
> able to exclude files/directories from being spidered by the
> application, eg. http://example.com/forums/.
> Anyone got a suggestion or two?
If all you need is the list of URL's (no titles) put in an XML tree, you can
fetch all .htm and .html files (or just get a local copy of your site and
filter by filetype. Here I am assuming that HTML files represent your main
pages (as opposed to say .c or .txt files).
Since you might generate some page 'on-the-fly', you may also wish to run
'wget' on a Linux/Mac box or Windows with Cygwin. What this essentially
will do is fetch of all content recursively. It will save all pages to your
hard-drive while presenring a structure that is consistent with the URL's.
You can then use 'find' to get listings of all the files and filter the
output. This way you get only what is considered to be pages worthy of
being indexed. Putting that in XML form should not be too hard. I know it's
not perfect, not simple, but it's free. At the end of the day you have a
tool that you can share among your peers.
Roy S. Schestowitz