Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Daily Crawlers Breakdown

__/ [Brian Wakem] on Saturday 31 December 2005 18:21 \__

> Roy Schestowitz wrote:
> 
>> Does anybody know a (preferably free) tool that will extract crawlers data
>> from raw log files and produce a day-by-day breakdown of traffic from each
>> search engine? I can see total (aggregated) volumes using available tools,
>> but they tend to be more (human) visitor-oriented.
>> 
>> Thanks in advance,
>> 
>> Roy
>> 
>> PS - I am skeptic about this. I would be surprised if something free
>> exists which is capable of achieving this. Happy to be proven wrong, even
>> if silence implies that.
> 
> 
> I only monitor the big 3.
> 
> 
> #######################################
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> my @bots = ('Googlebot','Yahoo! Slurp','msnbot');
> my %ua;
> open (LOG, "< $ARGV[0]") or die "Can't open log ($ARGV[0]) - $!";
> while(<LOG>){
>         chomp;
>         if (m!("(.*?)" ".*?"$)!) { # <-- depends on logfile format
>                 my $ua = $1;
>                 foreach (@bots) {
>                         if (index ($ua, $_) != -1){
>                                 $ua{$_}++;
>                                 last;
>                         }
>                 }
>         }
> }
> close LOG;
> 
> foreach my $ua( sort { $ua{$b} <=> $ua{$a} } keys %ua ) {
>         printf "%-12s\t%d\n", $ua,$ua{$ua};
> }
> 
> ############################################
> 
> 
> $ ./bot /usr/local/apache2/logs/access_log
> Googlebot       10595
> Yahoo! Slurp    1326
> msnbot          12

Thanks a bunch, Brian. Since Perl is double-dutch to me, is there any way of
having the above script separate the numbers by day? I only had a shallow
look and I suspect the functionality is there, somewhere.

It's important to me as I suspect a certain batch of links (WordPress
support) encouraged a lot of crawling, but I can't tell to what extent, if
at all. I haven't kept track of a daily running sum, so I need to look at
this in retrospect. Visitors and AWStats haven't got this functionality. I
don't know about Analytics, but I can never use it properly.

Many thanks,

Roy

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index