The 'truthiness' of mashups

One of the classic problems that enterprise integration efforts face is the provenance and accuracy of information aggregated from external sources.  If the source information is wrong, how do you tell?
Written by Dion Hinchcliffe, Contributor

One of the classic problems that enterprise integration efforts face is the provenance and accuracy of information aggregated from external sources.  If the source information is wrong, how do you tell? And how do you figure out who exactly is at fault?  When incorrect information flows through an endless information ecosystem, does it just get recirculated and amplified ad infinitum?  Not a pleasant prospect.

Unfortunately, mashups only exacerbate this situation.  Heck, they probably bring it to the forefront.  The direction the Web is taking, with increasingly pervasive syndication, a proliferation of pure information services that encourage information remixing, and ready information aggregators of every description popping up all over the Web, and the resulting network effects will just amplify the issue.

Nicholas Carr, the well known author of Does IT Matter? and a sometimes critic of Web 2.0, wrote yesterday about something he calls the 'truthiness' of Web 2.0-style mashups:

Entrepreneurs are launching all sorts of sites and services that are built on data that they're siphoning out of third-party sites and databases. Sometimes, the secondhand data is good; sometimes, it's not. The process of chopping up and bundling data from many different sources can, moreover, amplify inaccuracies. Combine bad data from two different sources and you may get bad data squared. Unfortunately, to the user, the inaccuracies are invisible.

Carr concludes that if mashups end up providing too much unreliable information, users will rebel against them.  And certainly, we'd only need a few well-publicized events, like the Zillow example he cites, for mashups to be relegated as second-tier citizens on the Web.  But letting mashups sit on the bench for the next cycle of Web innovation would be very unfortunate indeed; the promise of these techniques are just too great.

The good news is that there are concrete things mashup creators can doto instill trust, or at least a sense of trust.  The 'truthiness' that Carr talks about is the appearance of accuracy instead of the real thing.  The fact is, it's pretty easy for most mashup designers to understand where they lose control over accuracy.  One place is whenever information is gathered in the Web client, whether the client is the one you provide or the client that your mashup sources provide.  Understanding where your sources get their information is definitely part of the solution.


The other place is in your mashup information sources.  You know they could be unreliable, you just don't have any control over how. Due to your continued experience with them, you might even know in what ways their information isn't precise.  This is just the sort of thing your users will want to know and you should share it with them.  While rating the reliability of information is a slippery slope that has tripped up many an integration effort, it can't hurt and is probably essential to flag information as being from an 'external source', thereby advertising at the very least that it's not your fault.  Even better, provide direct traceability to the source for possible correction.

Of course, the reality is that more and more information will be contributed by the users of Web systems, inside and outside the firewall.  This is the harnessing collective intelligence piece of Web 2.0 that is so powerful.  Data that is relevant and current will increasingly come from these aggregate, online contributions instead of traditional, old-school databases.  This lets mashup creators leverage the contributions of thousands, millions, and eventually billions of users. But it's a blade that cuts both ways and exposes all of us to likely information inaccuracy.  But like Carr's example with Zillow real estate information, inaccurate information can even come from the most traditional sources.

The real point of all of this is not that mistakes will always happen but that network effects make them much more pronounced.  One mashup source might have a wrong bit of information but when that source is in turn used by a hundred mashups, the errors get propagated, amplified, and perpetuated on a mass scale.  Providing provenance of mashed up information, being transparent about your sources, and enabling source traceability is likely the only solution today, and I encourage you to provide it.

Editorial standards