Re: Crawling Behavior

  • Subject: Re: Crawling Behavior
  • From: "Wrm" <nomailstodragon@north.invalid>
  • Date: Wed, 10 Aug 2005 10:52:52 +0300
  • Newsgroups: alt.internet.search-engines
"Roy Schestowitz" <newsgroups@schestowitz.com> kirjoitti 
> Wÿrm wrote:


> 'The' is a stop word. It doesn't (shouldn't) get indexed. Any results
> returned for such a query should be taken with a grain of salt.

Actually not that long when google also did return indexed pages with 
stopwords, especially last year before they "doubled" their index it was 
very obvious that in reality they had already way more than that 4 billion 
pages in index :)

> Inktomi (Yahoo) do not appear to pick up many images. They certainly don't
> add up to quite the same amount of traffic (bandwidth or hits). They also
> have bugs in their code which causes them to crawl sites incorrectly and
> raise many errors. In my mind, Yahoo have the poorest crawler (not search
> engine, but _crawler_) among the top 3 SE's.

That might be indeed.

> Yahoo probably likes your site. An old site of mine gets crawled primarily
> by MSN, slightly by Google, but is largely neglected by Yahoo.

True, I am top in yahoo for my keywords, then again I am top too in msn, and 
in google, just not with that many keywords there. Mainly reason is I have 
been experimenting with inpage optimization and how it affects SERPS.

> Site maps are supposed to speed indexing up, or so I imagine.

True, been using google sitemaps too since they started to use those. Before 
sitemaps there were some pages that weren't indexed, after google started to 
use sitemaps it didn't take that long before they had all pages indexed and 
considering I wasn't adding links I believe it was because of sitemaps.

> They must be 'cycling' their attention. Perhaps the intervals between 
> heavy
> crawls is more or less fixed...?

Some reason for yahoo there seems not to be "heavy" crawling in mysite. Very 
steady crawling instead. For MSN and Google that might be true but latest 
logs from 1.5 months show more heavy crawling very randomly from both 
engines, need to check some my older logs to see if there's certain 
interval, but I can't remember I'd seen anything like that sofar. In June 
21st googlebot was busy, in July 27th was busy day, and in sofar it was 
August 3rd when they were busy. Some reason those moments Googlebot gets 
busy seem more random though...

