__/ [Borek] on Sunday 01 January 2006 17:53 \__
> On Sat, 31 Dec 2005 18:39:33 +0100, Roy Schestowitz
> <newsgroups@xxxxxxxxxxxxxxx> wrote:
>> Does anybody know a (preferably free) tool that will extract crawlers
>> data from raw log files and produce a day-by-day breakdown of traffic
>> from each search engine? I can see total (aggregated) volumes using
>> available tools, but they tend to be more (human) visitor-oriented.
> My logs are already cut into days, so all I need to do is
> grep inktomisearch.com statslog.20060101.txt | grep -c inktomi
> grep msnbot statslog.20060101.txt | grep -c msnbot
> grep googlebot statslog.20060101.txt | grep -c googlebot
> But then - even if you have all logs kept in one file - it should
> be enough to modify above to something like
> egrep "googlebot.*31/Dec/2005" yourlogfilename | grep -c googlebot
> It is only a question of finding proper string to search for :)
I only began to think about this approach as the discussion progressed. I
guess there are a variety of other analyses such as page requests, hits,
frequency, and visualisation of the results, e.g. using a graph. Calc or
gnuplot can do that, but not in a serialised fashion, which is the advantage
of self-contained stats packages.
John Bokma wrote a script to analyse Google crawling behaviour and write the
output in CSV or tab delimited values.