Why Pre-process Access Logs (Log scrubbing)

We'll I've finally released a public beta of my log scrubbing utility.

It's been 6 months in concept, several months of internal testing along with some limited user testing. So now I need some serious feedback before it can make the leap to a production version.

You might be asking yourself "Why Should I optimize my log files after all my web analytics program filters hits out of my log files?" The answer is really simple. Do you want to speed up the time it takes to process your access logs and generate more accurate results?

Think about it, each hit filter takes time to review each hit and decided if should include or exclude it from the reported results. Now repeat that additional time for each time you process your log files. The result can be everything from an extra few minutes to hours a day depending how much data is being filtered. For one tester, the simple removal of a single specific page (used as part of a 404 redirect) from its log files resulted in almost 9 million fewer pages a year that had to be filtered out of the analysis.

As to accuracy, it's so easy to miss setting a filter to capture variety possibilities. This may result is some pages being accidentally excluded or included in your results.

By pre-processing and reviewing the extracted data (log file dirt), you can be sure that only items you don't want are removed and significantly reduce the size of your web log files.

So give log scrubbing a try. For more information about optimizing (scrubbing) your access logs and to download the beta visit my website K'nechtology.


Stop Google From Using DMOZ Listing

Here is a great article on the new meta tag recently embraced by Google that was originally created by MSN.

Why do want to stop Google from using your listing in the Open Directory Project (ODP) aka DMOZ? Simply ODP has always been slow to update listings to reflect changes within company offerings or they might simply have it wrong. Either way, this could effect your ranking in the SERPs or even what is display as a site description within the SERP.

It's all covered in the article from Stepforth.


FireFox 2.0 Now in Beta

Mozilla has just released FireFox 2.0. It's a big jump from the current production version of 1.5.

Just take a look at the new or enhanced features
* Built in Phishing Protection
* Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
* Changes to tabbed browsing behavior
* Ability to re-open accidentally closed tabs
* Better support for previewing and subscribing to web feeds
* Inline spell checking in text boxes
* Search plugin manager for removing and re-ordering search engines
* New microsummaries feature for bookmarks
* Automatic restoration of your browsing session if there is a crash
* New combined and improved Add-Ons manager for extensions and themes
* New Windows installer based on Nullsoft Scriptable Install System
* Support for JavaScript 1.7
* Support for client-side session and persistent storage
* Extended search plugin format
* Updates to the extension system to provide enhanced security and to allow for easier localization of extensions
* Support for SVG text using svg:textPath

Of course its still early in its beta, but keep an eye open for its release. To find out more go to http://developer.mozilla.org/devnews/


Search Engine Market Share in China

Seems a lot of people have been reading my previous post about search engine market share here in Canada and in the United States.

So I dug up some more numbers for you. Hope you find this useful.

Search Engine Market Share for China
TMCnet.com reports:
Baidu leads with 43.9%
Yahoo is in second place with 21.1%
Google is in third place in China, with 13.2%

China Internet search engine sector achieved a total income of CNY 303 million in the first quarter this year.

TMCnet sources this information from the Beijing Modern Business Daily from the Tuesday, June 27, 2006 edition

The full article can be access at www.tmcnet.com/usubmit/2006/06/28/1699289.htm