Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Google Sitemap: unexpected implication

__/ [www.1-script.com] on Sunday 04 September 2005 17:46 \__

> Hello everyone!
> While messing with XML site maps, I got an interesting error message from
> Google: something along the lines of "your server returns 200 response on
> missing document, cannot proceed until 404 is returned". Of course it
> does, silly, I'm using custom error pages! So, what now, no custom error
> pages if you're using a sitemap? It makes sense, obviously: big G does not
> want to analyze content of every stinking page while expecting a large
> amount of errors in people's sitemaps pointing to non-existing pages. It's
> obviously easier for them to just look at the server's response and drop
> those that return 404, but what a webmaster to do now? I've been happily
> using custom error pages for years now and do not want to give up on them.
> 
> Anyone's got a creative workaround here?

Can you sniff the agent, e.g. crawler/human, and respond appropriately? I
suppose you could embed some conditional statement in 404.shtml (Apache?),
but I am still not sure what will happen as far as the request status is
concerned.

I am fairly sure that 404 is shown in my error logs regardless of the user
involved. Even if big G follows a dead link, big G will receive a 404,
despite the heavily-customised page. How it then treats it I am not
entirely sure. Given the status, I don't think it ever fetches or handles
it at all. I suspect that Yahoo keeps trying the same exact broken link
every 2 hours of so even if you have already fixed it. This can become a
real nuisance, that behaviour. Be aware that not only big G is doing
unpleasant things. In fact, even Microsoft (MSN) can be blamed:
http://schestowitz.com/Weblog/archives/2005/07/12/msnbot-fights-linux-servers/

Have you messed about with customised error pages settings? I can't think of
an appropriate widget in cPanel, but maybe there is a workaround which can
be applied at individual configuration file-level. Why do you have 404's
coming up if I may ask? If inevitable you occasionally have broken links,
why not repair them as soon as they emerge? I check my error logs almost 10
times a day for that reason.

Roy

-- 
Roy S. Schestowitz      | "Turn up the jukebox and tell me a lie"
http://Schestowitz.com  |    SuSE Linux    |     PGP-Key: 74572E8E
  6:35pm  up 11 days  6:46,  3 users,  load average: 0.17, 0.49, 0.44

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index