John Bokma wrote:
> Gandalf Parker wrote:
>> Byron <email@example.com> wrote in
>>> Is there a piece of software that can cross-reference the site root
>>> with the log files and produce a list of files that have had hits in
>>> the last 6 months?
>> You mean have NOT had hits? I could write one I think. Basically it
>> would list the files in the directory, then search for each one in the
> Ouch, O(n^2).
> Better, hash each page URI from the log, preferable with a count. ( O(n) )
> Next, find each page in the htdocs directory, turn it into a relative URI,
> check if it's in the hash, if not, add it and set the count to 0. ( O(n)
Or just write a naive shell script and let the machine sweat overnight. The
real problems crop up when re-using the scripts for very large sites
(>10,000 files) with very big (>100 MB) logs.
Roy S. Schestowitz