__/ [ Catherine Milton ] on Friday 24 March 2006 08:43 \__
> "Jez" <j.ez@xxxxxxxxxx> wrote in message
>> tonnie wrote:
>> > Jez wrote:
>> >> How do I stop google indexing thousands of useless pages such as this
>> >> one using robots.txt?
>> >> http://www.absolutedirectory.com/cgi-bin/rate.cgi?ID=10567
>> >> I have never tried to use it before and can't think of a better way to
>> >> do it.
>> >> I appreciate your help!
>> > Hi Jez,
>> > Hm, cgi-bin is a directory i would close for all robots (at least the
>> > ones that do follow the robots.txt
>> > User-agent: *
>> > Disallow: /cgi-bin/
> Since we're on the topic of robots.txt: I looked at my website stats as
> provided by my host, and in the list of pages not found, 404, etc, it shows
> the robots.txt as not found VERY often! I haven't even tried to find out
> any more about it, but maybe this could be causing a problem?
No. It shouldn't be a cause for concern unless there are pages you want to
exclude from search engine endices. To suppress these 404 errors, I strongly
advise you to create a file called robots.txt, ensure it _is empty_ and
leave it in your main directory.
> I try to check this list on a regular basis in case something gets lost,
> and it's always been the same - the majority of the 'not found's is that
> page, even though it does actually exist.
I do the same thing too. If you reduce the level of 'noise' (false alarms)
sufficiently, this can help you tremendously in detection of bad code,
broken links and mischievous attempts to attack your site.
Hope it helps,
Roy S. Schestowitz | Useless fact: Digits 772-777 of Pi are 999999
http://Schestowitz.com | SuSE Linux ¦ PGP-Key: 0x74572E8E
11:05am up 16 days 3:42, 9 users, load average: 1.38, 1.21, 1.11
http://iuron.com - Open Source knowledge engine project