Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
> __/ [ John Bokma ] on Thursday 25 May 2006 17:59 \__
>> However, "recently" it's possible to get 502 Gateway error. So I have
>> my new scripts retry a few times with a few seconds time out between
>> each retry.
> 502? I never knew it was even defined.
>> Nah, wrong I think. Two reasons: first is it provides them data on
>> how it is used, by whom, and what requests are populair. I think this
>> data is worth a lot.
> I think that standard queries can more meaningful. They are used for
> Google Trends, too.
But who are using the Google API? They tried to make a division between
normal users and ??? users. The question is, who are ??? users? I doubt
??? = SEO, but I am sure that the data is very interesting.
>> Second of all, a normal request uses up more bandwidth (guess, but I
>> am quite sure about that).
> Yes, but the normal user does not repeat things mechanically.
The API is limited to 1,000 requests. I am sure that 1,000 requests the
normal way take up way more bandwidth compared to 1,000 API requests.
Moreover, a lot of data is just visual mark up, and hence needs to be
dropped anyway. The meaningful data needs to be parsed out, redirects
(rare) need to be fixed, etc.
>>> I wonder if Apache has a similar mechanism for querying
>>> requests from bot when actual people request pages. Could be
>> Why? What's the point in delivering a page "slower" to Googlebot?
> Humans are impatient, crawlers need not be impatient. Think about
> overloaded shared servers.
So what do you suggest? Put crawler requests in a queue?
>> Moreover, delivering the page means Apache is occupied with that
>> connection, and does exactly the opposite of what you have in mind.
>> Giving the bot as fast as possible a reply is better.
> Well, you can shuffle or re-priorities the stack, putting crawlers
> higher on the stack.
And what is going to happen when the queue gets fuller and fuller with
>> There is a compression mod that helps with this (the HTML page is
>> send compressed).
> True, but it is an optimisation that applies to most. You can't really
> use that as a caveat, in my opinion.
>> If you're really interested in giving bots the fastest answer
>> possible, you might consider "cloaking", i.e. strip all that's not
>> needed for the bot from your pages. And no, I doubt any SE is going
>> to punish you for that.
> ...Until too many people do it for SEO purposes, or whichever misuse
> /de jour/.
Of course I was talking about an optimized page with the same content,
which can't be abused per definition :-D
John Freelance Perl programmer: http://castleamber.com/