Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index


Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:

> __/ [ John Bokma ] on Thursday 25 May 2006 17:59 \__

Google API

>> However, "recently" it's possible to get 502 Gateway error. So I have
>> my new scripts retry a few times with a few seconds time out between
>> each retry.
> 502? I never knew it was even defined.


>> Nah, wrong I think. Two reasons: first is it provides them data on
>> how it is used, by whom, and what requests are populair. I think this
>> data is worth a lot.
> I think that standard queries can  more meaningful. They are used for
> Google Trends, too.

But who are using the Google API? They tried to make a division between 
normal users and ??? users. The question is, who are ??? users? I doubt 
??? = SEO, but I am sure that the data is very interesting.

>> Second of all, a normal request uses up more bandwidth (guess, but I
>> am quite sure about that).
> Yes, but the normal user does not repeat things mechanically.

The API is limited to 1,000 requests. I am sure that 1,000 requests the 
normal way take up way more bandwidth compared to 1,000 API requests. 
Moreover, a lot of data is just visual mark up, and hence needs to be 
dropped anyway. The meaningful data needs to be parsed out, redirects 
(rare) need to be fixed, etc.

>>> I wonder if Apache has a similar mechanism for querying
>>> requests from bot when actual people request pages. Could be
>>> handy... 
>> Why? What's the point in delivering a page "slower" to Googlebot?
> Humans are impatient, crawlers need not be impatient. Think about
> overloaded shared servers.

So what do you suggest? Put crawler requests in a queue? 

>> Moreover, delivering the page means Apache is occupied with that
>> connection, and does exactly the opposite of what you have in mind.
>> Giving the bot as fast as possible a reply is better.
> Well, you can shuffle or re-priorities the stack, putting crawlers
> higher on the stack.

And what is going to happen when the queue gets fuller and fuller with 

>> There is a compression mod that helps with this (the HTML page is
>> send compressed).
> True, but it is an optimisation that applies to most. You can't really
> use that as a caveat, in my opinion.
>> If you're really interested in giving bots the fastest answer
>> possible, you might consider "cloaking", i.e. strip all that's not
>> needed for the bot from your pages. And no, I doubt any SE is going
>> to punish you for that.
> ...Until too many people do it for SEO purposes, or whichever misuse
> /de jour/.

Of course I was talking about an optimized page with the same content, 
which can't be abused per definition :-D

John                  Freelance Perl programmer: http://castleamber.com/

Quick Bookmarks:http://johnbokma.com/firefox/quick-launch-bookmarks.html

  • References:
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index