madison

Data mining: Digging user info for gold

Rachel Konrad | February 8, 2001 12:00 AM PST

You like science fiction books, and Amazon.comwants to sell them to you. So why does the e-commercegiant peddle DVDs, Q-Tips and Valentine's Daychocolates when you click on its site?

The answer is simple, scientists say: Amazon.com (amzn) and mostother e-tailers have yet to perfect a practice knownas "data mining," the use of statistical analysis touncover hidden patterns in otherwise randominformation.

Experts predict data mining will be one of the mostrevolutionary developments of the next decade, key todelivering a "personal Web," tailored to anindividual's preferences, by identifying a usefulstructure in collected information and analyzing it inreal time. The influential MIT Technology Reviewrecently hailed data mining as one of the 10 emergingtechnologies that will "change the world."

But some academics warn that mainstream mining merely"dumbs down" the sophisticated craft--and may resultin screwy conclusions. Already, analysts arecautioning potential investors that the volatilesegment may be unduly hyped."A lot of people think, 'I'm just going to put this inthe hands of the marketer and we'll get the secretsauce,'" said Bob Moran, a managing vice president atthe Boston-based Aberdeen Group. "But there's no suchthing as 'secret sauce.' Data mining is all aboutpushing back the gray zone. It's never entirelyuncovering the black and white."

But marketers who recognize its vast commercialpotential see data mining as more than black andwhite. They also see green in the science's potentialto create higher margins and inflate revenue.

Does it make sense?
Sophisticated or not, various forms of data-miningdevelopment are being undertaken by companies lookingto make sense of the raw data that has been mountingrelentlessly in recent years. A recent article in theEngineering News-Record noted that e-commerce hasempowered companies to collect vast amounts of data oncustomers--everything from the number of Web surfersin a home to the value of the cars in their garage.

"Over the past few years, while (database)construction has gradually taken up digitalinformation tools in pursuit of efficiency and profit,a by-product--mountains of recorded data--has beengathering," Tom Sawyer wrote in a November edition ofthe industry trade publication. "Now, the realizationis spreading that the mountains are filled with gold."

About a dozen small data-mining companies arejockeying to gain market share, and database andsoftware companies such as Oracle and IBM are edginginto the field. Others are creating more automateddata-mining applications for nonstatisticians, makingthe science more tangible to marketers and otheralgorithm-ignorant users.

Through data mining, marketers can target customerswith personalized stock quotes, news updates, specialpromotions and other information they are most likelyto use, dramatically reducing advertising budgets andboosting revenue. It is also entirely automated,reacting instantly to changes in a customer'sbehavior, unlike the vast majority of personalizedservices on the Web today that require people to fillout questionnaires.

Perhaps the biggest challenge for data mining is onethat many experts say cannot be solved--and one thatmay justify skepticism about the entire niche. Datamining is a good predictor of consumer behavior basedon past behavior--what people are likely to purchasebased on previous transactions, demographicinformation and other data points. But, critics say,it will never be able to predict what people reallywant to buy.

For example, data mining can determine that a34-year-old, home-owning woman with two children islikely to purchase a detached microwave every threeyears for the next decade. Yet it cannot determinethat this particular consumer would rather purchase amore expensive integrated microwave-convection ovencombination if it came vaguely into her price range.

Kyle Johnstone, director of business intelligence forEmerald Solutions, said figuring out what people wouldrather purchase, as opposed to what they merely settlefor, is the key to inflating profit margins--theultimate goal of marketers. The only way to do that isto ask people what they really want, as opposed torelying on previous spending habits.

"People will tell you they like steak, but when theyhave parties for the Fourth of July, they buyhamburger. There's a disconnect between what you buyand what you desire," Johnstone said. "You can figureout the behavior of performance metrics, but whatyou're missing--the biggest piece of the puzzle--iswhat it is that people really want...It'smathematically impossible to determine that."

Dancing around privacy
Most data-mining companies get customer informationfrom the corporate clients that hire them to build andhost their databases for fees that usually start atabout $10,000 per month. The data miners skirt privacyconcerns by keeping the information in-house.

They then crunch the data and send it back to theclient in the form of spreadsheets, graphics, barcharts and other visual documents. Some data-miningcompanies also act as consultants, recommending toclients how to tweak Web pages for maximumeffectiveness.

Few data-mining companies are willing to discussreal-world examples of how the craft has boosted salesor customers. But Usama Fayyad, a former Microsoft (msft)executive, who left the company to create Kirkland,Wash.-based DigiMine, said he used data mining to helprevamp Microsoft's MSNBC.com Web site and boostreadership.

Fayyad found that a 22 percent slice of MSNBC readershad nearly identical online behavior, clicking onexactly the same reports. But these users didn't fitinto any of the company's five reader categories,which included political news-hounds, sports junkiesand weather buffs.

Fayyad, who holds a doctorate from the University ofMichigan, said his company determined that the glueholding this mysterious group together was vaguelyscandalous stories similar to those in gossiptabloids. MSNBC changed its format significantly toappeal to this large group, and now the home page isrequired to have at least one such feature per day.The research helped turn MSNBC's Living section intothe site's most popular destination, Fayyad said.

"The lesson is that before data mining, they didn'tknow what was happening to a quarter of theirdatabase," Fayyad said. "If three or four shelves fallover in a brick-and-mortar store, the customers won'twalk around them and the clerks will fix them. Theequivalent is happening on the Web, but no one knowshow to fix the bottlenecks."

Datamining makes inroads
For decades, utility companies have been using datamining to predict with some accuracy when generatorsare likely to fail. The technique started making moreinroads into the corporate world in the 1990s,catching on as a means to detect fraud in theinsurance, health care and credit card industries. Byfinding patterns and predicting likely behavior,companies can catch people who lie on applications orare likely to engage in dangerous or illegalactivities.

So far, few general consumer e-tailers and contentproducers are fully exploiting data mining. That'spartly because the practice--involving algorithms,samplings and parallelisms--is complicated and poorlyunderstood. But it's starting to find its way into themainstream.

"E-commerce is the newest and hottest use," saidMichael Gilman, president and chief executive of DataMining Technologies of Bethpage, N.Y. "Anywhere youhave historical data, you can use it to get patternsthat you can't see with the human eye."

One of the oldest and largest data-mining companies isthe 25-year-old SAS Institute, based in Cary, N.C.,which says it had already been working with 98 percentof Fortune 500 companies and is now targetinge-commerce. Retailers that sell products via catalogsand Web sites routinely increase their return oninvestment by more than 1,000 percent by using datamining, according to SAS statisticians.

"A lot of catalog companies were doing a fine businessbefore, thank you very much," said Anne Milley,analytical strategist for SAS. "Then we came in andthey were amazed. You look at who they're targeting,what they're sending and how often, and the frequencyof repeat purchasers. You look at marketing mix--whois buying through catalogs, who is buying online--andfigure out what is the optimal way to contactcustomers."

Data mining is likely to penetrate society further asthe technology becomes easier to use.

San Mateo, Calif.-based Epiphany is one of severalWeb-based customer relationship-management companiesthat is deeply involved in data mining and is wellknown for its relatively easy-to-use tools.

George John, who has a doctorate in statistics fromStanford University and is the self-declared "datamining guru" of Epiphany, said the company'scontroversial simplification of data mining wasintentional. He considers it one of Epiphany's biggestattributes when vying for business against otherdata-mining companies--which feature software that maybe more sophisticated but is usually vastly moreelusive to the average business.

"In the first generation of data mining at Epiphany,we tried to step back and see what business userswould use it for--we knew they'd be asking lighterquestions, where you wouldn't need 10 Ph.Dsforecasting profitability down to the penny," saidJohn, an IBM veteran who began the data-mining programat Epiphany. "Every time we tried to make the (userinterface) cleaner, we thought, 'Now the marketerswill use it.' It was just paying attention to whatpeople wanted to do."

Though it seems logical, the practice of simplifyingdata-mining results has its detractors. Fayyad andother experts warn that excessive simplification canskew results and lead executives to make pricing orinventory decisions based on faulty reasoning.

A more fundamental controversy is also brewing as datamining moves out of academia and into the corporateworld: Academic statisticians take pride in theircomplex analyses, and many snub fellow Ph.Ds who entercorporate environments, calling them turncoatspandering to marketers.

John, the Epiphany guru, says he must constantlycorrect people who use the term "dumbing down" torefer to the company's color charts and other simplestatistical diagrams. He prefers to call it "deeperpenetration" of data mining into the ranks ofmarketers and other nonstatisticians.

"We profile a set of customers with nice charting,drawing pictures of what customers are like," Johnsaid, almost apologetically. "The key was admittingthat was OK. It was OK if the technology behind itwouldn't get you a Nobel Prize."

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity