Open-source tools wrest control of personal data

Adriana Lukas explains how the open-source Mine project will help people regain control of data held about them on the web
Written by Cath Everett, Contributor

In today's information age, personal data is possibly one of the most valuable assets an individual can own. But swathes of it are being gathered and held by companies and public sector bodies, often with little apparent benefit to the person to whom it belongs.

Even though such information, accurate or otherwise, relates directly to them, individuals are unable to access it easily and are certainly not able to capture, exploit or share it with others for their own ends.

It is this imbalance that social media guru Adriana Lukas is attempting to redress. After listening to Doc Searls, a fellow at the Berkman Center for Internet and Society at Harvard University, she started working with his ideas on vendor relationship management (VRM) in October 2006.

VRM is a community-driven project that is intended to turn the notion of customer relationship management (CRM) on its head. Its aim is to create a standards-based framework from which online tools and services can be developed so that consumers regain control of their personal data on the web.

Within a year or so, such concepts had led Lukas to set up an open source project called Mine as a vehicle for putting such principles into practice.

Alpha code has just been released, written by lead developer Alec Muffett, a Sun Microsystems alumnus who gained notoriety in the late 1980s after developing the Crack Unix password-cracking program and releasing it on the Usenet.

The Mine project also has about half a dozen other active contributors, with a further 20 to 30 on a list waiting to join. ZDNet UK caught up with Lukas to find out about the latest developments at the project.

Q: Where does the Mine project stand today?
A: Alpha code is out there, so developers can download and play with it, but today it's like a car with an engine and levers, but no bodywork or dashboard. So to do anything with it at the moment, you have to be a pretty significantly 'geeky geek'.

But we hope to have a beta for early adopters and developers with some kind of usable basic HTML interface by late November that will run on OS X and Ubuntu. We're about a third of the way there now, but the interface will make or break this, so we have to get it right and we're still looking for additional user-interface experts.

The idea is to lower the 'geek barrier' to the point where you only need to be a moderately advanced technophile to set up a Mine of some form. The basic functionality will be there and it'll be usable, so people will be able to build 'objects', create feeds, upload pictures, play with tags and the like. But it won't be pretty and smooth — it will be functional.

We're going for a heatwave adoption strategy, so we're not aiming it at everybody. That's why the open source model is so important. The first users will be developers who can scratch an itch by using Mine in its basic form, but who then can help improve it for the next wave of users.

The second wave will probably be the same social web early adopters who drove blogging, micro-blogging and other web applications that are now mainstream. They care about user autonomy on the web, privacy, data sharing and that kind of thing, and so they'll also improve on it until it's in a position where the average consumer can use it. So it's about concentric circles.

What will Mine look like and what will it enable users to do?
Mine is essentially infrastructure software, but it will look something like a Google dashboard and do three things. The first is to allow you to capture, reclaim and bring together all the data you generate using tools such as Twitter, Flickr and Delicious, which will be stored in your database that no-one else has access to. It's Mine as in a pick axe that you use to mine things, but also in the sense that you own something.

The second bit is about enabling you to analyse, manipulate and display the raw data in ways that you can't do now — the academic term is personal informatics. So, if you have a credit card statement or current account, you can tag each entry — whether it's groceries or travel — and analyse how much you're spending on it. And by mining data, you can add value that no-one else can, because it's yours and so you understand the context.

The third element involves sharing your information and this is where the Mine innovation really comes in. You can use Mine feeds to share data, whether it has been processed by you or not, with different people at different levels.

So it's not like a blog, which has a feed that everyone subscribes to and they all see the same thing. It's about generating a feed that's specifically tailored to your interests, and you can share your detailed knowledge of wine or restaurants or whatever in a controllable way.

But your friends won't actually need to use Mine themselves if they don't want to, because they'll also be able to receive feeds in Google Reader. The aim is not...

...to invent anything new until it's absolutely necessary, so instead we're trying to tap into what's already out there to keep it simple and encourage adoption.

Another scenario is that you might be interested in sharing a feed with vendors that will subscribe to it to see what you're saying. And if that's scaled up across lots of customers, they'll get a picture of consumption that they're not normally able to obtain and they'll be able to learn from that information. So it's customer analytics coming from the other end.

But if they abuse it, for example, by spamming you or analysing your data without giving anything back, you can cut off the feed. The feed is a proxy for relationships and there's no point if there's no reciprocity. Vendors will have to learn that and that's the social engineering bit.

Will Mine be packaged as a single, unified platform?
The Mine project isn't just about producing software and we're not proposing to come up with a branded application, such as Flickr or Facebook. Instead, it's supposed to be more of a phenomenon like emailing or blogging.

The whole point is that there should be multiple instances of different Mines. So, you might see a Mine web page, phone application, a plug-in integrated with Firefox, a newspaper or personal diary. Because a Mine, like a blog, is a personal platform, how it looks is important to the user, but it will not necessarily be the same for everyone.

We don't want to invent some mysterious new technology and require everyone to learn about and adopt it, and so we're taking existing technologies and bending them by 90 degrees. It's more about giving people the ingredients so that they can rearrange them in any way they want.

We'll develop the infrastructure and other people can develop different forms of Mine, as well as feed readers and open source, free and commercial applications that will plug into the platform and enable you to do things like analyse your financial history. Mine will provide the specs and the representational state transfer, or Rest, API and we'll release the code under the Apache licence.

But Alec [Muffett] also plans to come up with a couple of applications himself, such as geo-location to let people determine your position, and he'll maybe port Google Calendar into Mine so that people can share information, rather than having to type in a URL.

On what technology is Mine based?
In 2008, we developed a proof-of-concept prototype in Perl, but this year, we've been reimplementing and improving it using the Python language and the Django web framework. The new PyMine implementation has 4,000-plus lines of code and is on a rolling release schedule, which is being updated almost daily while it's in alpha.

The next step will be to port it to the Google App Engine, as it gives developers 500MB of free storage capacity. This means that they'll be able to download and host it on a server so they can experiment with it. We'll also help people run it on Amazon's Elastic Compute Cloud or on their own systems.

But we're also designing what we've called the Mine Mesh, which is essentially a Usenet using bounce messaging in a similar way to the BitTorrent protocol. BitTorrent chops up movies into small chunks and distributes them across the computers of subscribers so that they each store a bit locally. But we're turning the idea upside down in that Mine will encrypt your data so no-one knows how to put it together apart from you. I don't think we'll be able to do that in the first release, though — it's more likely to be 1.5 or 2.0.

The problem with file-sharing systems is that they often don't work very well due to a lack of decent cryptography and critical mass. So to work effectively, there have to be enough people to distribute the data to so that their systems can store chunks of it. On the cryptography side of things, we're currently looking at [distributed storage encryption system] Tahoe, as well as other options.

As a stop-gap measure, we may need to find a contributor that can provide us with hosting capabilities. A peer-to-peer system would be my preferred choice, but if its absence is starting to hinder adoption, we'd have to find a way around it — although I'd be very vocal that it was only a stop gap and that people would need to import it to the new system when it became available.

It really doesn't make sense to use a hosting company or to create a Facebook kind of service for Mine, because the aim of all this is to enable users to become autonomous nodes and to give them the tools to take their data back under their own control — not hand it over to someone else.

Editorial standards