I've had domain specific languages on my mind recently, so when a Wired News article on Hancock, a DSL for data mining "communities of interest" crossed through my feedreader, I spent some time digging into it.
The gist of the story is this: AT&T invented a language, called Hancock, (source code and binaries available) for analyzing data flows. Using Hancock and their own data, AT&T can "sift calling card records, long distance calls, IP addresses and internet traffic dumps, and even track the physical movements of mobile phone customers as their signal moves from cell site to cell site."
The amazing thing to me is that people are shocked that AT&T would do this! Leaving aside the legality of providing the data to the FBI--not something I'm arguing--companies mine their data all the time for various purposes. The two that AT&T seemed to be using Hancock for: marketing and preventing fraud.
The fraud angle interests me because of my work in the area of reputation. A 2001 research paper called Communities of Interest outlines how this works and introduces the term guilt by association. By analyzing calling patterns to identify associates of known fraudsters, you find other fraudsters--and, of course, various innocent people as well.
While this sort of thing probably wouldn't hold up in a court of law, as a method of preventing fraud it's tried and true. Credit card companies do similar things. A company I'm familiar with, iovation builds systems that do this sort of thing in all kinds of sectors--they call it reputation and so do I. From their product page about online gaming:
iovation ReputationManager exposes hidden associations among fraudsters' devices and accounts to help iGaming sites:
- Stop fraud and shut down repeat offenders regardless of their fictitious identities
- Uncover fraud based on hidden associations among the fraudsters' devices and accounts
- Make fact-based decisions whether to allow or deny player transactions
In other words--guilt by association. It's powerful and it works, so why wouldn't companies use it? Any reputation system has privacy costs. I'm happy that my credit card company monitors my transactions to prevent fraud that might affect me. But I have no control over that--it's just done. I'm less thrilled with the phone company monitoring my call traffic so they can market to me better. Again, no choice for me. The question, I believe, is can we provide reputation systems that give users control over that trade-off?