__/ [ hug ] on Tuesday 21 March 2006 13:22 \__
> I've been working on a new niche-oriented ecommerce engine for what
> seems like forever now. It appears to be slowly approaching that
> state sometimes known as "done" which is generally defined by the
> process "call it good".
> Now, as part of this gizmo, I've added some instrumentation to allow
> the webmaster or site-owner to determine this, that, and the other.
> For example it will tell you what pages were viewed in what order and
> for how long, etc.
> One part of the instrumentation is related to tracking robots. Here
> is where I would like to ask your opinion.
> Currently it has the following:
> * Recognize robots in serveral ways, with operator confirmation.
> * For each robot, it tracks:
> - what ip-addresses the robot has used
> - what pages the robot has fetched
> - when each page the robot fetched was last fetched
> It reports this information in various ways, for example it will tell
> you which robots are active (page fetched in last xx minutes), it will
> tell you all the pages a given robot has fetched along with date/time
> last fetched, and it will report on all your pages telling you which
> robots have fetched them.
> However, I have the feeling that some important kinds of information
> are missing, information I could be capturing if I just thought to
> capture it.
> So here is my question. For your own purposes in furthering your
> website's search-engine optimization, what information would be most
* Volume of crawling, expressed in terms of bandwidth (most easily digestable
as it's a single number that contains many pertinent details)
* Breakdown into crawler names (vendors). This enables the Webmaster to
assess how SEO affects the different algorithms. It specifically tells the
Webmaster what appeals for more attention, crawling and activity (often
various other things, some of which were mentioned above, can be queried for
without a front-end/instrumentation toolset. E.g. pages fetched by robot can
be checked using "site:URL". Last page fetched is irrelevant if general
traffic is high and the number of pages is high. I also cannot see how IP
addresses of robots contribute anything.
> [And if anyone can tell me why I keep typing "robit" instead of
> "robot" that would be greatly appreciated!]
Typing pattern habits as in "weird/wierd" or maybe a Freudian slip (Rabbit?).
Roy S. Schestowitz | Coffee makes mw to0 jittery
http://Schestowitz.com | SuSE Linux ¦ PGP-Key: 0x74572E8E
2:10pm up 13 days 6:47, 7 users, load average: 0.81, 0.88, 0.77
http://iuron.com - next generation of search paradigms