Data mining - what is it good for? Without or clue or lead of some kind, absolutely nothing. That's pretty much what datamining experts told Washington Post reporter Guy Gugliotta.
Details of the NSA's activities remain unclear, but data mining experts say they are puzzled about how the information might be used. . ... To discern suspicious call patterns from lists of dialed numbers, they will have to dig past the raw data into callers' identities, and, in the vast majority of cases, will find they have simply tapped into networks of law-abiding people involved in daily routines.
. . . "When they look at a map of phone numbers, they have no idea what's going on," said Valdis Krebs, an expert in deriving "social networks" from databases. "It might not be a bad person you find; it may be that the soccer team and the softball team are calling the same pizza parlor."
NSA Director Michael Hayden's testimony suggests NSA is searching for phone records of known suspects, but you don't need datamining for that. "[IBM distinguished engineer Jeff] Jonas and others noted that tracking suspects' telephone records was a staple of good police work long before electronic search engines made it feasible to scan trillions of calls," the Post writes.
That process - good old police work still works best, expert said. Going the other way - mining data for proof of suspicious activity and following it down to a suspect - is far more difficult, perhaps impossible.
"I'm sure the NSA is excellent at finding patterns and motifs in the data, but what do they mean?" Krebs asked. "Unless you start getting more information on the patterns, you're not going to be able to interpret them at all. Patterns alone won't tell you whether someone's good or evil."
Still, there are some intriguing possibilities.
"Suppose you looked at calls between two geographical points, and you could see what kind of pattern ordinary people had," said Olvi L. Mangasarian, co-director of the University of Wisconsin's Data Mining Institute. "Then you compare it to another pattern of calls that you know" are suspicious and try to develop a "classifier" -- a software tool -- to distinguish between them, he said. "It would be difficult -- but it would be doable."
Ultimately, the people doing the mining would have to "expert, even visionary," the Post writes to be able to get signal out of the noise.
"Even if one out of 10 searches is a hit, the technique is useful," one expert said. "But one out of 1,000 or one in 1 million?" In these cases, experts suggest, maybe the technician would be more cost-effective by searching something besides phone logs.