Scott Lemon has been writing lately about what companies (like Google) know about you. In a post called "Google knows who you really are" he concludes:
Google knows you like no one else. Google knows more about you and I then we know about ourselves. Google will use this to provide us what we really want ... right? Google will do no evil ... right? Google would never use this data to use us ... to manipulate our undistinguished behaviors ... right? The Internet is here, and some things appear to be inevitable ...
The loss of search histories by AOL brings this concern to light. In a post showing what kind of data is lost, Declan McCullagh gives some disturbing clues:
From that massive list of search terms, for instance, it's possible to guess that AOL user 710794 is an overweight golfer, owner of a 1986 Porsche 944 and 1998 Cadillac SLS, and a fan of the University of Tennessee Volunteers Men's Basketball team. The same user, 710794, is interested in the Cherokee County School District in Canton, Ga., and has looked up the Suwanee Sports Academy in Suwanee, Ga., which caters to local youth, and the Youth Basketball of America's Georgia affiliate.
That's pretty normal. What's not is that user 710794 also regularly searches for "lolitas," a term commonly used to describe photographs and videos of minors who are nude or engaged in sexual acts.
Lemon calls this information our undistinguished identity, that is information about us that compiles our online behavior. The problem is that it's often wrong.
David Berlind cites Raul Valdes-Perez, CEO of enterprise search solution provider Vivisimo who calls this information "fools gold." David says:
Just because I may end up recording or "on-demanding" Queer as Folk and Brokeback Mountain, does that mean that my cable TV provider can start to make certain assumptions about me, the additional programming I might want to see, and how to target advertising into that programming (not to mention how search on those titles could influence future search engine results)?
Great question. Off and on, I've noticed that various online enterprises have erroneously concluded that I'm black or a woman or whatever. No harm done, of course, but it's interesting to note that someone probably paid a premium to show me an ad for something I'm unlikely to be interested in. Phishers seem to be getting better at this. I'm getting phishing scams targeted by geographic location now.
This discussion is related to the whole attention and intension discussion that happened at ETech last spring. How much control should you have over the data companies collect about you? After ETech, I concluded that most people want a lot of control over that data and that achieving that end will be much harder than we think--technically, legally, economically, and politically.
Another reason this discussion interests me so much is because of the work I've been doing in reputation systems. Reputation is largely based on the linking of identities and the information that's connected to them. Making reputation systems that respect people's rights requires following certain principles. We made a list at the Berkman identity mashup last June. I'm not convinced we have them all, but we've got a good start.