Web sites are churning out mountains of clickstream data as dot-com companies expand their e-business operations. Soon, Microsoft Corp. will be providing new mountain-climbing gear.
The Redmond, Wash., company is developing customizable software and services that it says will provide a way to harvest and analyze massive amounts of data culled by e-commerce sites.
With the as-yet-unnamed program, due by the end of the year, Microsoft is putting its muscle behind an emerging concept that some call "Webhousing" -- applying data warehouse principles to clickstream data to better tailor e-business to the likes of finicky surfers and shoppers.
The Microsoft offering, which will be available initially through several dozen select partners, is based on an internal implementation called Integrated Decision Support System. Microsoft has been using IDSS for several months to collect some 200GB of data per day from its Web pages.
Microsoft can reduce that 200GB, most of which is useless, to 500MB through a pair of parsing and filtering utilities with compression algorithms and a preaggregation tool from Syncsort Inc. The data is shipped to SQL Server via the database's built-in data movement tool, then it is organized in OLAP (online analytical processing) cubes in SQL Server's OLAP Services engine.
No easy task
Transforming huge volumes of cryptic clickstream data into a manageable form is among the greatest issues that Webhousing poses, experts say.
"It's a big challenge to overcome, to get to a point where the data is meaningful, and it's not an easy problem to solve," said Mark Beavers, a senior manager of online operations at Dell Computer Corp., in Round Rock, Texas.
Dell, whose e-commerce operation generates $30 million a day in revenue, is far from mastering the process. The company, which uses SQL Server and other Microsoft products, currently moves more than 200GB of daily clickstream data but only does rudimentary analysis on it, Beavers said.
A package that minimizes the pain of Webhousing sounds appealing, Beavers said. "I think it's a fairly untapped market," he said.
"You need to know how long someone is at your site and what pages they're looking at," said Jeff Block, vice president and CIO at SelectTeeTimes.com, a San Diego-based company that uses SQL Server. "'Stickiness' is the name of the game -- stickiness translates to dollars. The only way you can be sticky is to know what your users are doing and tailor your site that way."
Join the crowd
The Microsoft program will include the parsing and filtering utilities with Windows NT, Visual Studio tools and BackOffice applications such as SQL Server, based on customer requirements. Microsoft Consulting Services will train the solutions providers that will implement the Webhousing program, officials said.
Microsoft is not alone in eyeing the opportunity. Several smaller vendors, such as Accrue Software Inc., net.Genesis Corp. and WebSideStory Inc., offer software and services for low-end clickstream data analysis. And SAS Institute Inc. next month will announce SAS Solution for e-Intelligence, software that enables dot-com companies to analyze and respond to customer behavior on their Web sites.
The SAS system monitors and analyzes clickstream data, instructs Web servers to serve up particular pages to surfers based on their prior activity and presents other data particular to the customer, said SAS officials in Cary, N.C.
The software also enables IT organizations to combine clickstream data with conventional customer data to produce a more complete customer profile. Officials did not say when the software will ship.
Additional reporting by John S. McCright