We'll I've finally released a public beta of my log scrubbing utility.
It's been 6 months in concept, several months of internal testing along with some limited user testing. So now I need some serious feedback before it can make the leap to a production version.
You might be asking yourself "Why Should I optimize my log files after all my web analytics program filters hits out of my log files?" The answer is really simple. Do you want to speed up the time it takes to process your access logs and generate more accurate results?
Think about it, each hit filter takes time to review each hit and decided if should include or exclude it from the reported results. Now repeat that additional time for each time you process your log files. The result can be everything from an extra few minutes to hours a day depending how much data is being filtered. For one tester, the simple removal of a single specific page (used as part of a 404 redirect) from its log files resulted in almost 9 million fewer pages a year that had to be filtered out of the analysis.
As to accuracy, it's so easy to miss setting a filter to capture variety possibilities. This may result is some pages being accidentally excluded or included in your results.
By pre-processing and reviewing the extracted data (log file dirt), you can be sure that only items you don't want are removed and significantly reduce the size of your web log files.
So give log scrubbing a try. For more information about optimizing (scrubbing) your access logs and to download the beta visit my website K'nechtology.