Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Google going overboard.

__/ [Borek] on Monday 06 February 2006 09:34 \__

> On Mon, 06 Feb 2006 10:03:32 +0100, Doug Laidlaw
> <laidlaws@xxxxxxxxxxxxxxx> wrote:
> 
>> I have a Web site with a php subdirectory and an HTML subdirectory.  My
>> robots.txt excludes access to the php subdirectory, and other search
>> engines seem to be respecting that.  Google however is not only
>> downloading
>> the php files, but according to the logs, they are generating reports to
>> save.  The result is that both I and my host's home site have already
>> used
>> up our month's allocation.  I complained to them last night (our time) by
>> email, but the logs show them back at it at 4 p.m. today.
> 
> Show the domain name.


Yes, I imagine you have robots.txt with:

User-agent: *
Disallow: /php

but we probably ought to see it for ourselves. If you only added that
exclusion rule recently, you might need to give Google a wakeup call. Once
they enter a certain part of your site due to no explicit denial, the pages
reside in the index and are difficult to remove. They should not be
re-crawled though.


> Why do you think it is Google problem?


It sounds possible that Google is merely the first engine to penetrate
through that route. It is usually the first engine to go deep and get
working.


> Best,
> Borek

-- 
Roy S. Schestowitz      |    Prevalence does not imply ideali$M
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
  9:45am  up 20 days  5:01,  10 users,  load average: 1.60, 0.57, 0.21
      http://iuron.com - help build a non-profit search engine

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index