Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Google Sitemaps - sure way to lazy Googlebot?

  • Subject: Re: Google Sitemaps - sure way to lazy Googlebot?
  • From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
  • Date: Sat, 12 Nov 2005 06:21:47 +0000
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / MCC / Manchester University
  • References: <HF3df.4$eX4.0@fe07.news.easynews.com> <ht9df.5018$rP3.2635@fe3.news.blueyonder.co.uk>
  • Reply-to: newsgroups@xxxxxxxxxxxxxxx
  • User-agent: KNode/0.7.2
__/ [Eric Johnston] on Friday 11 November 2005 23:05 \__

> "www.1-script.com" <info_at_1-script_dot_com@xxxxxxx> wrote in message
> news:HF3df.4$eX4.0@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Hello good people in  a.i.s-e!
>> I want to run this observation by you and get some responses based on your
>> personal experience with G sitemaps. I have to admit, this may be a case
>> of ever increasing paranoia, but I can't help thinking that my using G
>> sitemaps changed Googlebot behavior on my site(s) in an adverse way.

Google Sitemaps are probably valuable when you know that the number of in-
dexed pages does not match what exists (publicly) on your domain. This in-
creases your capacity to have pages crawled, I assume -- crawl entire list
rather  than re-crawl over and over again in a badly-structured (non-stan-
dardised)  tree. This assumes that all pages are static. If not statically
placed  (e.g. if generated from a database), then that database should not
change  either,  or change linearly (expansion). A single  crawl+index  is
usually sufficient, *for good*, especially if you archive messages history
that is immutable.

>> It has been two months since I started using sitemaps. Right after I put
>> them up the G saturation (site:www.xxx.com) jumped from 100K+ pages to
>> 1.2M+, which got me very excited indeed. However, the traffic has not
>> increased a bit, which you'd think is strange because just by sheer luck
>> having 12 times more pages would get you at least couple times more
>> visits, but this was not the case. Additionally, having been watching
>> Googlebot's activity, especially that of  "Deepbot"  (I am a true believer
>> in Freshbot/Deepbot concept), I noticed a strange trend: the number of
>> Deepbot visits went down sharply - from 2K-3K per day to 50-300 per day.
>> In line with this trend was the actual measurable traffic from Google -
>> pretty steady. Down one day, up another, but stable over long run (Jagger
>> time included).
>> There are deviations from this behavior - such as enormous spike in
>> Deepbot's activity two days ago (75K pages/day), but the end result of
>> using the Sitemap seems to be grand ZERO if not negative. Traffic from G
>> was rising steadily before I put the maps up, and then stopped at that
>> level, which makes me believe number of Deepbot does matter.

I am not too sure (see below).

>> Does anyone confirm that trend? I'm contemplating yanking the sitemaps off
>> my sites, but don't want to do any damage, obviously, so I'm looking for
>> some additional supporting data.

Have you seen a drop in the number of referrals? Are inconsistencies among
datacentres to take the blame (at least temporarily)?

> I don't use a sitemap but have been using the * facility in robot.txt to
> remove many pages from the Google index in the hope that removing large
> numbers of near duplicate pages with zero information value might be
> beneficial in improving the average value of pages on the site.  Until
> recently I did not realise just how many rubbish pages were being generated
> by my forum like "user profile pages", all with different file names, but
> which all contained just the words like "Sorry <name> please log in first"
> If you use the sitemap maybe you should be selective about indexing the
> good pages and leaving out the rubbish.

I noticed a similar mistake in my site, which was recently corrected using
robots.txt. This resulted in outcomes of computer vision experiments being
indexed,  almost  tripling the total number of indexed pages. It's  mainly
numbers,  so no harm will be done. Nonetheless, this consumed  unnecessary
spidering traffic.


Roy S. Schestowitz      |    "ASCII stupid question, get a stupid ANSI"
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
  6:10am  up 9 days  2:08,  4 users,  load average: 0.54, 0.77, 0.81
      http://iuron.com - next generation of search paradigms

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index